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COKBENSUS CONrXGURXTZOIIXL BIXB MOKtE 

CARLO KETHOD AHD 8Y8TEH FOR 
PHARMACOPHORE gTROCTURE PETERMIMATIOH 

This specification includes in Sec« 8 computer program 
listings that are exemplary embodiments of the computer 
programs of this invention. 

A portion of the disclosure of this patent document 
contains material vhich is subject to copyright protection. 
The copyright owner has no objection to the facsimile 
reproduction by any one of the patent disclosure, as it 
appears in the Patent and Trademark Office patent files and 
records, but otherwise reserves all copyright rights 
whatsoever. 

This invention was made with Government support under 
Grant number 1R43CA62752-01 awarded by the National 
Institutes of Health. The Government has certain rights in 
the invention. 



1. riZLD OF TPE INVgyripy 

The field of this invention is computer assisted methods 
of drug design. More particularly the field of this 
invention is computer implemented smart Monte Carlo methods 



-~ ^Exch interest as 

inputs to determine highly accurate molecular structures that 
must be possessed by a drug in order to achieve an effect of 
interest* -Illustrative^ l^^ 

et al., 5,307,287 to Cramer, III et al., 5,241,470 to Lee at 
al., and 5,265,030 to Skolnick et al. 

Protein interactions have recently emerged as a 
fundamental target for pharmacological intervention. For 
example, the top two major uncured diseases in the United 
States are atherosclerosis (the principal cause of heart 
attack and stroke) and cancer* These diseases are 
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responsible for greater than 50% of all U.S. mortality and 
cost the U.S. economy over $200 billion per year. A 
consistent picture of these dis ases, which has gradually 
emerged during the past ten y ars of molecular biological and 
S medical research, views both as triggered by disordering of 
specific molecular recognition events that take place among 
sets of proteins present in both the normal and disease 
states . 

Hierarchical, organized patterns of protein-protein 
10 interactions are often referred to as "pathways" or 
"cascades." At the molecular level, cancers have been 
determined to be the deregulation of pathways of interacting 
proteins responsible for guiding cellular growth and 
differentiation. During the past year, individual cellular 
15 events have been organized into nearly complete mechanistic 
explanations of how a cell's behavior is controlled by its 
environment and how communication pathway errors lead to 
uncontrolled proliferation and cancer. Disruption in similar 
pathways are responsible for the proliferation of blood 
20 vessel walls marking the atherosclerotic disease state (Cook 
et al., 1994, Nature 369:361-362; Hall, 1994, Science 

^_ .^^^J"^^-^-^'^^^^ ' Zhang et al . , 

1993, Nature 364 : 3 08^~313) 7 ^ 

Inhibition or stimulation of particular protein- 

25 substrate interactions have long been known drug targets. 

Many important ant i-h^ 

analogues, antibiotics, and chemotherapeutic agents act in 
this fashion. Captopril, an antihypertensive drug, was 
designed based on its ability to antagonize a focal blood- 

30 pressure-regulating enzyme. 

Proteins involved in biological processes, either as 
part of protein-protein pathways X5r as enzymes, are composed 
of domains (Campbell et al . , 1994, Trend. BioTech. 
12:168-172/ Rothberg et al-, 1992, J. Mol. Biol. 

35 227:367-370), Domains, or regions of the protein of stable 
three dimensional (secondary and t rtiary) structures, play 
several major roles, including providing on their surface 
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small regions C^examples of carg ts") , wher proteins and 
substrat s are abl to bind and interact, and functioning as 
structural units holding other domains together as part of a 
large protein (tertiary and quaternary structure) . The 
5 interaction surface of a domain or target is fundamental to 
determining binding specificity. Targets are often small 
enough that the principal contribution to the binding energy 
is short range, highly localized to several amino acids 
(Wells, 1994, Curr. Op. Cell Biol. 6:163-174). The 

10 functional specificity of targets and domains, responsible 
for the incredible diversity of cellular function, ultimately 
rests with the arrangement of amino acid side chains forming 
their interaction surfaces, or targets (Marengere et al . , 
1994, Nature 369:502-505). 

15 It can be appreciated, therefore, that pharmacological 

intervention affecting the specific protein-protein and 
protein- substrate recognition events occurring at protein 
targets is of fundamental importance, particularly for 
effective drug design. 

20 However, achieving desired pharmacological interventions 

in a predictable manner remains as elusive as ever. Early 
approaches to drug design depended on the chance observati on 

-o^f-'bioiT5gxcal^f^e^'s~of"~a"^^ or the screening of 

large numbers of exotic compounds, usually derived from 

25 natural sources, for any biological effects. The nature of 

the ac t ua t pro t e in target: wa s usuall y" linlc^^^ 

2.1- TARGET STRUCTURE -BASED 

APPROACHES TO DRUG DESIGK 

Rational approaches to drug design, have met with only 

limited success. Current rational approaches are based on 

first determining the entire structure of the proteins 

involved in particular interactions, examining this structure 

for the possible targets, and then predicting possible drug 

25 molecules likely to bind to the possible target. Thus the 

location of each of the thousands of atoms in a protein must 

be accurately determined before drug d sign can begin. 
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Direct experimental and indirect computational methods for 
protein structure determination are in current use. However, 
none of these methods appears to be sufficiently accurate for 
drug design purposes according to current rational 
5 approaches- 

The primary direct experimental methods for determining 
the structure of proteins involved in particular interactions 
are X-ray crystallography, relying on the interaction of 
electron clouds with X-rays, and liquid nuclear magnetic^ 
10 resonance (NMR) , relying on correlations between polarized 
nuclear spins interacting via indirect dipole-dipole 
interactions. X-ray methods provide information on the 
location of every heavy atom in a crystal of interest 
accurate to 0.5-2.0 A (1 A 10'* cm). Drawbacks of x-ray 
15 methods include difficulties in obtaining high-quality 
crystals, expense and time associated with the 
crystallization process, and difficulties in resolving 
whether or not the structure of the crystalline forms is 
representative of the in vivo conformation (Clore et al., 
20 1991, J. Mol. Biol. 221:47; Shaanan et al . , 1992, Science 
227:961-964) . High resolution, multidimensional, liquid 
phase NMR techniques represent an attrac tive alternative, to 
"^^e^extrent~lthat tliey~carr"be applied in situ (i.e., in aqueous 

environment) to the study of small protein domains (Yu et 
25 al., 1994, Cell 76:933-945). However, the complexity of the 

anaiys is-of the various wrual WrreTaK^ 

consuming, and the correlations (primarily from the nuclear 
Overhausser effect) provide no better accuracy than X-ray 
methods. Isotopic enrichment of proteins with "C and 
30 reduces the time associated with analysis, but at a great 
expense (Anglister et al., 1993, Frontiers of NMR in Biology 
III LZOll) . 

Protein structures determined by any of these current 
methods do not predict success in subsequent drug design. 
35 Resolution obtainable either by measurement or computation, 
g nerally 0.5-2 A, has oft n been found to be inadequate for 
effectiv direct drug design^ or for selection of a lead 
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compound from organic compound libraries. The resolution 
required to understand both drug affinity and drug 
specificity, although not precisely known, is probably 
measured in fractions of an A, down to 0.1 A (MacArthur et 
5 al., 1994, Trend. BioTech. 12:149-153). This accuracy 
appears to be beyond the capabilities of many current 
methodologies . 

Prior research has identified tools which, although 
promising, cannot be used in a coordinated maniier fo 
10 design. One promising measurement approach with speed, 
simplicity, accuracy, and the ability to carefully control 
the measurement environment is rotational echo double 
resonance (REDOR) NMR, a type of solid state NMR (Guillion 
and Schaefer, 1989, J. Magnetic Resonance 81:196; Holl et 
15 al., 1990, J. Magnetic Resonance 81:620-626 and McWherter, 
1993, J. Am, Chem. Soc. 115:238-244). REDOR accuracy can be 
below the 0.1 A believed to be sufficient for direct drug 
design. However, since REDOR measures only a few selected 
distances, it is not usable in drug design methods which 
20 depend on the initial determination of the complete structure 
of the protein containing the target of interest. 

Once a target's structure is determined by the^abo ve 

merh^ds~,~"mos^ra^xona^ paradigms call for the 

prediction of small drug structures that will bind (or dock) 
25 to the target. This prediction is generally done by 

computat ionai -methodsT of "Which "se^^^^^ use . 

Most seek to predict the position of all the thousands of 
atoms in a drug structure. Purely aJb initio computational 
approaches to high resolution structure analysis, such as 
30 quantum statistical mechanics and molecular dynamics, require 
prohibitive computing resources. To apply either approach, 
the potential energy, or Harailtonian, of the entire system 
must be known. Statistical mechanics provides an expression 
for the probability of any given protein configuration as a 
35 ratio of partition functions. Proper quantum statistical 
mechanics required for an exact evaluation of full protein 
partition fxinctions is not curr ntly computationally 

- 5 - 
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feasibl , as it would involve many thousands of atoms 
including the target, the protein, and the aqueous 
environment- The application of even simple, approximate 
quantum statistical mechanics to simple systems in aqueous 
5 environments is currently a non- trivial task (Chandler, 1991, 
in Liquids, F reezing, and Glass Transitions. Elsevier, NY, p. 
195) . Molecular dynamics computes the dynamics of a 
molecule's motion in time. Computing the atomic dynamics of 
all the perhaps thousands atoms of a protein is an extreme 

10 computational burden. Only picoseconds, or at most a few 
nanoseconds, of molecular time can be simulated, which is 
insufficient to determine a high resolution, equilibrium, 
structure (Smit et al., 1994, J. Phys. Chem. 98:8442-8452). 
In any case, most of the information determined is wasted, 

15 since only the structure of the protein binding target are of 
interest in drug design. 

Further, current approximate computational techniques 
for protein structure determination are in need of greater 
accuracy or efficiency. The most common techniques depend on 

20 Molecular Dynamics or Monte Carlo methods (Nikif orovich, 
1994, Int. J. Peptide Protein Res, 44:513-531; Brunger and 
Karplus, 1991, Acc. Chem. Res. 24:54-61). These methods 
randonaS^aTt er in it^ 1 molecular structures by generating 
simulated thermal perturbations, and then average the 

25 ensemble of results to determine a final structure. The 
generated "per^urbatior^ 

constraints and be energetically favorable. If both 
conditions are not met, the perturbation will be discarded. 
Current Monte Carlo methods applied to constrained protein 
30 structure determinations productively use only approximately 
1 out of 10* perturbed structures generated (Siepmann et al., 
1993, Nature 365:330-332). This extreme waste of computer 
resources results in time consuming, low resolution structure 
determinations . 

35 To summarize, existing rational drug design methods 

bas d on identification of target structxire fail to r liably 
yield drug molecules du to experimental structure 
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determination difficulties and computational difficulties 
associated with predicting drug structures with ill-defined 
Hamiltonians. 

5 2-2- DIVERSITY -BASED APPROACHES TO DRUG DESIGN 

Another method for exploring protein target interactions 
utilizes "recognition systems'* which comprise huge libraries 
of related molecules (Clarkson et al . , 1994, Trend. BioTech. 
12:173-184). From such a library only those members binding 

16 to the "target of interest are selected. Such recognition 
systems must encompass the structural diversity of protein 
targets while being amenable to serve for the selection of . 
lead compounds for drug design. Antibodies are one classic 
example of such a system that certainly meets the recognition 

15 requirement. Unfortunately, there is a need to determine the 
antibody structures needed for lead compound selection more 
rapidly and accurately. While about 2000 recognition regions 
have been sequenced, only about 23 in the Brookhaven Protein 
Structural Database have structures determined to even within 

20 2 A (Rees et al., 1994, Trends in Biotech. 12:199-206). 

Promising recognition systems at the opposite extreme 
comprise huge libraries of small^ peptides. The ^mal 1 
— --peF^t^ideS'-mu^t-be-saftic^^ so that they attain a 

level of affinity and specificity similar to that obtained by 

25 protein domains. Given the role peptides play in nature, 
this condition- can - be met- by" surpri singry " s^ structures , 
with € to 12 amino acids. However, linear peptides are either 
unstructured or weakly structured at room teir5>erature in 
aqueous solutions (Alberg et al., 1993, Science 262:248; 

30 Skalicky et al., 1993, Protein Science 10:1591-1603). From a 
practical viewpoint, linear peptides must be constrained to 
reduce their degrees of freedom (reduced conformational 
entropy) and to increase their chances for strongly binding. 
These constraints, or scaffolds, limit the range of stable 

35 conformations and make more straightforward determining bound 
structure (Olivera et al., 1990, Science 249:259; Tidor et 
al., 1993, Prot ins: Structure Function and Genetics 15:71) . 
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Methods are now available to create such libraries and 
to sel ct library members that recognize a specific protein 
target. The production of constrained peptide diversity 
libraries requires synthesizing oligonucleotides with the 
5 desired degeneracy to code for the peptides and ligating them 
into selection vectors (Goldman et al , , 1994, Bio/Tech, 
10:1557-1561). Once a constrained structured diversity 
library is created, it is a source from which to select 
specif ic members that bind to a target of interest. Beginning 

10 with a known pathway involving specific domain-domain or 
protein-substrate interactions at a target, molecular 
biological methods can be used to identify in a matter of 
days small ensembles of highly constrained peptides from 
these huge libraries that bind to these domains with high 

15 affinity and specificity. 

While this field has been exploding in the last few 
years and showing great potential, it is severely limited by 
its use in isolation without the benefit of integrated 
structural analysis needed both to derive the high resolution 

20 structures of binding peptides and also to direct the 

construction of additional structured libraries. Drug design 

is nqt_aideji.J?xJb'ayi^ library members recogni 2.in.g^the - ^ 

protein target of interest but without any understanding of 
why the recognition occurs. This is entirely similar to the 

25 random screening methods of early fortuitous drug design 
efforts . 

Unfortunately, rational drug design according to current 
approaches (target structure-based) remains an inefficient, 
laJDorious process with a disproportionately high lead- 

30 compound failure rate. Presently, about 90% of lead 

compounds fail to emerge successfully from clinical trials 
(Trends in U.S. Pharmaceutical Sales and Research and 
Development, Pharmaceutical Manufacturing Association, 
Washington , D . C . , 1 9 93 ) . 

35 It is becoming clear that low- resolution structures of 

an entire protein or target (at 0.5-2 A), or an 
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uncharacterized lead, such as produced by chemical diversity 
methods, leave much to be desired for use in drug design. 

If the limitations of prior art methods were overcome 
and a sufficiently accurate structure needed by a molecule to 
5 bind to a target of interest could be determined, existing 
chemical libraries could be searched for highly targeted lead 
compounds with similar structure (Martin, 1992, J. Medicinal 
Chcm* 35:2145-2154). This database search can be based not 
only on chemical and electronic properties, but also on 

10 geometric information. Such searches that have high 
resolution (better than 0,25 A), would provide a vast 
improvement over the prior art, as lower resolutions lead to 
an exponentially increasing number of potential leads. 

Computational methods to determine high resolution drug 

15 structures from recognition system binding information or NMR 
partial distance measurements are not currently available. 
No current structure determination methods uses such 
additional information to make more efficient or more 
accurate determination of high resolution structures 

20 (Holzman, 1994, Amer. Sci. 872:267). 

Citation of a reference or discussion hereinabove shall 
n ot_^be cons trued as an adm4s s i orv-j^afczSjich^a s-prior— ar^^o^-^-" 
the present invention. 



25 3 , SUMMARY^ OF THE lyVEKTION 

it is a broad object of this invention to address the 
prior art problems of drug design by providing a method of 
rational design of drugs that achieve their effect by binding 
to a target molecule or molecular complex of interest. 

30 Importantly, this object is achieved without requiring 

detemination of the structure of the molecule or molecular 
complex ("target molecule") bearing the target or even of the 
target itself. The method is target structure independent. 
The method of the invention uses an interdisciplinary 

35 combination of computational modeling and simulation, 
experimental distance constraints, and molecular biology. 
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In an important aspect, the invention provides a 
computer implemented modeling and simulation method to 
determine a highly accurate consensus structure for the 
pharmacophore and a structure for the remainder of the 
5 molecule from diversity library members that bind to the 
protein target of interest. Where prior structure 
determination methods focused on the structure of the target 
molecule or of the target, the method of this invention is 
uniquely adapted to focus instead on the structures of 

10 molecules that bind to the target . Such structural 

information is directly applicable to drug design since it 
defines the structure a drug must possess to bind to the 
target of interest. Also, this structural information is 
much easier to determine by use of the present invention, 

15 since it concerns molecules with many fewer atoms than the 
target molecule. The method of the invention achieves 
accuracy by improving upon the accuracy and utility of the 
input structural information. In a further embodiment of the 
invention, the method employed for structural determination 

20 is a smart Monte Carlo technique adapted to small constrained 

molecules . 

The structure determination method of the invention 
allows one to take maximum advantage of the information 
obtained from the molecular biological selection of the 

25 diversity library members that tightly and speci^Tcail>nbind^ 
to the target molecule of interest . The selected library 
members must share some common structure to bind to the same 
target molecule. The smart Monte Carlo computer method of 
this invention specifically seeks and provides this common 

30 structure. 

The invention also provides a method of performing REDOR 
NMR measurements of molecules on a solid phase substrate. In 
a preferred embodiment, the sxibstrate is a solid phase on 
which the molecule (e.g., peptide) has been synthesized, with 
35 a high degree of purity. In another pr f erred mbodiment, 
performing REDOR measurements of such a molecule on a 
sul)strate can be done in a dry nitrogen atmospher , under 
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hydrated conditions, and when the molecule is either free or 
bound to a target. In a specific embodim nt, the REDOR 
measurements are accurate to better than 0.05 A from 0 to 4 
A, and to better than 0.1 A from 4 to 8 A. In an 
5 advantageous aspect of the invention, the structure 
determination method makes maximum use of these highly 
accurate internuclear distance measurements to constrain the 
determined common structure for the binding library members-; - 
The invention also provides methods of identifying a 

10 compound that specifically binds to a target molecule, by 
first screening a diversity library, and then using a genetic 
selection method for screening the compounds identified from 
the diversity library. 

In broad aspects, the invention provides a method and 

15 apparatus for rational and predictable design of new and/or 
improved drugs that achieve their effect by binding to a 
specified target molecule. More particularly, the invention 
is directed to a method for the rational selection of highly 
specific lead compounds for such drug design, including the 

20 computer implemented step of highly accurate determination of 
the structure responsible for this target binding by the 

,__Jiighl;y^:jtceurare^— consent bias Monte Carlo 

method. 

A lead compound serves as a starting point for drug 
25 development both the protein 

target of interest, achieving the biological effect of 
interest, and because it has or can be modified to have good 
pharmacokinetics and medicinal applicability. A final dbrug 
may be the lead compound or may be derived therefrom by 
30 modifying the lead to maximize beneficial effects and 

minimize harmful side-effects. Although any lead compound is 
useful, a lead that tightly and specifically binds to the 
target molecule of interest in a known manner, such as can be 
provided by the invention, is of great use- Knowledge of the 
35 high resolution structures in a lead compound responsible for 
its binding and activity provides a more focused and 
fficient drug dev lopment process. 
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The methods of the invention improve lead compound 
determination, by determining the "pharmacophore", the 
precise structural characteristics needed for a lead compound 
to specifically bind to a target of interest. The most 
5 fundamental specification of a pharmacophore is in terms of 
the electronic properties necessary for a molecule to 
specifically bind to the surface of a target molecule. These 
properties may be fundamentally represented by requirements 
on the ground and low lying excited state wave functions of a 
10 pharmacophore, such as, for example, by specifying 

requirements on the well known multiple expansion of these 
wave functions. 

The preferred pharmacophore specification according to 
the invention is in terms of both the chemical groups making 
15 up the pharmacophore and determining its electronic 

properties and also the geometric relationships of these 
groups. This chemical representation is not the only 
possible representation of the pharmacophore. Several 
chemical arrangements may have similar electronic properties. 
20 For example, if a pharmacophore specification included an -OH 
group at a particular position, a substantially equivalent 
specification mrgSt include an -SH gf oup'lft the same 
position. Equivalent chemical groups that may be substituted 
in a pharmacophore specification without substantially 

- 25 changing- its nature-are called- *homologous*^^^^ 

In particular embodiments, therefore, this invention 
provides a method and apparatus for the highly accurate 
determination of the pharmacophore needed to specifically 
bind to the target molecule of interest, by a specification 
30 of the geometric relationships of the important chemical 
groups. The pharmacophore is preferably determined by a 
smart Monte Carlo method from molecular biological input 
specifying molecules (preferably selected from among 
diversity libraries) that specifically bind to the target 
35 molecule and also preferably from REDOR NMR data specifying a 
few highly accurate distances in these select d molecules. 
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An important advantage provided by the invention is the 
ability to make a pharmacophore structure determination 
without relying on any knowledge of the structure of the 
target molecule or target. Where the target molecule is a 
5 protein, conventional prior art methods have sought to 
sequence and determine the structure of the protein 
containing the target, hoping thereby to determine active 
sites by examination of the structure. A further important 
advantage of the invention is that this structure 

10 determination can be made by use of a relatively small number 
of actual physical position measurements. In contrast, 
conventional methods using X-ray crystallography and liquid 
NKR require determination of positions of all atoms in the 
molecule ("binder") that specifically binds to the target, 

15 and the target. An additional advantage provided by the 
invention is that, in a preferred embodiment wherein REDOR 
structural measurements provide input information, the 
accuracy of the pharmacophore structure determination can be 
at least approximately 0.25-0.50 A or better. This accuracy 

20 is provided by the combination of an efficient, Monte Carlo 
technique for structure determination with a few highly 
accurate dis^anoeizd^tte rminat ions 



4- BRIEF DESCRIPTION OF THE DRAWINGS 
25 These and other featu^^ 

present invention will become better understood by reference 
to the accompanying drawings, following description, and 
appended claims, where: 

Fig. 1 is the overall method of this invention in its 
30 broadest aspect ; 

Pig. 2A and 2B are more detail for. the step of Pig. 1 
for selecting cauididate pharmacophore structures; 

Fig, 3 is more detail for the step of Fig. 1 for 
preforming distance measurements; 
35 Fig, 4 is more detail for the step of Fig. 3 for 

performing NMR measur ments; 
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Fig. 5 is REDOR NMR signal response details for step of 
Fig. 3 of data analysis; 

Fig. 6 is sample REDOR NMR spectra according to the 
method of Fig. 3; 
5 Fig. 7 is sample data analysis according to the method 

of Fig. 3; 

Fig. 8 is more detail for the Step of Fig. 1 for 
conf igurational bias Monte Carlo structure determinations- 
Fig , 9 is a sample of simulation completion data; 
10 Fig. 10 is further detail of peptide memory 

representation used in the method of Fig. 8; 

Fig. 11 is additional detail of peptide memory 
representation used in the method of Fig. 8; 

Fig. 12 is more detail for the step of Fig. 8 of 
15 processor generation of proposed modified structures by Type 

I moves; 

Fig. 13 is more detail for the step of Fig. 8 of 
processor generation of proposed modified structures by Type 

II moves; 

20 Fig. 14 is additional detail for the step of Fig. 8 of 

processor generation of proposed modified structures by Type 

— ff--moves7^ 

Fig, 15 is a structure for implementing the method of 
Fig. 8; 

- 25 Fig^ 1€ is the main-program structure of Fig. - 15~;~^ 

Fig. 17 is the structure modification program structure 
of Fig. 15; 

Fig. ISA and ISB are the Type I move generator program 
structure of Fig. 17; 
30 Fig. 19A and 19B are the Type II move generator program 

structure of Fig. 17. 

5. DETAILED DESCRIPTION 
For clarity of disclosure, and not by way of limitation, 
35 the detailed description of the invention is described as a 
series of steps. A broad view of the ex mplary steps of 
which the invention is comprised is presented in Fig. 1, a 
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brief overview of which is presented in the text that 
follows . 

The invention method preferably begins with a target 
molecule (or molecular complex) 1 having a binding target of 
5 biological or pharmacological interest. Specific binding of 
a molecule to the target is predicted to affect its 
biological activity and may provide biological effects of 
interest. For example, these effects might include 
amelioration of a disease process or alteration of a 

10 physiological response. Lead compounds 8 output from the 
invention are able to specifically bind to target molecule 1 

and can serve as starting points for -the design of a drug 

able to specifically bind to the target. 

Diversity library screening, step 2, allows the 

15 selection from among library members of a plurality of 

molecules [hereinafter called ♦'binders**] that specifically 
bind to target molecule (or molecular complex) 1; the 
chemical building block structure {e.g., sequence, structural 
formula) is then determined. If predetermined binders and 

20 their structure are already available, the invention can use 
this information directly without the need for library 

screenijigi^z^^ 

libraries may be screened. The selected binders all share a 
common pharmacophore structure, allowing their specific 

25 binding to the target in ache^^ 

manner. This common structure is preferably iteratively 
determined by a select and test method. Candidate 
phairmacophore selection, step 3, is based upon chemical 
structure homologies. Geometric and conformational 

30 information is not needed to be used at this step and is 
preferably not considered. A candidate pharmacophore shared 
by all the N binders is selected, step 3, for structure 
determination by subsequent steps. The binders will 
typically present several candidate chemical pharmacophores, 

35 ignoring conformation considerations. These candidates are 
small groups of library building blocks, often contiguous, 
ach candidate group in on binder being homologous to the 
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candidate groups in all the other binders. Building block 
hOTTOlogies are determined by applying rules appropriat to 
the diversity library. In the preferred embodiment, 
homologous building blocks have similar surface chemical 
5 groups, since pharmacophores are defined by a similar 

geometric arrangement of chemical structures . In* the case of 
the preferred library, CX^C, candidate pharmacophores are 
amino acid sequences whose side chain surface groups have 
— similar chemical properties. Amino acid homologies are 
10 determined by mechanical rules described below. These 

candidate sequences are typically 3 amino acids long, but may 
range from 2 all the way to 6 . Where pharmacophores are 
defined by their charge distributions, homologous library 
building blocks must have similar charge distributions. 
15 Having selected N binders by screening one or more 

libraries and determined a candidate pharmacophore in each 
binder, the subsequent steps of distance measurement, step 4, 
and Monte Carlo structure determination, step 5, determine a 
highly accurate structure for the candidate pharmacophore, if 
20 possible. This determination will be possible if the 

candidate is the actual pharmacophore. A subsequent test, 
^^^_^~st.e£--6^^_^ determination. 
In particular cases, distance measurements may not be 
necessary in order to determine an adequately precise 
25 pharmacophore structure. 

Measurements are made, step 4 , of a few strategic 
distances in the binders, that will be most useful for the 
subsequent structure determination step. A minimum number of 
strategic interatomic distances in the binders are measured 
30 in step 4. These few distances constrain possible binder 
structures and make the subsequent complete structure 
dfitermination more efficient and more accurate. In preferred 
but not limiting embodiments, measurement methods yielding 
distances accurate to at least approximately 0.25 A or less 
35 are used. The preferred methods use nuclear magnetic 

resonanc {"NMR"] techniques. Particularly preferred is the 
rotational -echo double resonance ["REDOR"] NMR method for 
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directly measuring ^^C-^*N internuclear distances in peptides, 
the most accurate current method for simply and inexpensively 
obtaining such distances. It is generally capable of 
accuracy to 0,1 A and a span of 8 A, In a specific 
5 embodiment, peptide binders are synthesized from amino acids 
labeled with ^^C and ^*N. Labeling is chosen to obtain the 
most useful distance data about the selected candidate 
pharmacophore structures. Either backbone nuclei, side chair, 
nuclei, or both can be labeled. The step is detailed below. 
10 Liquid NMR techniques can also be used to indirectly 

determine internuclear distances in peptides, but are less 
preferred since they require considerable data interpretation 
to obtain distances of less accuracy than those obtained by 
use of REDOR. 

15 Structure determination, step 5, determines a precise 

geometric conformation for both the candidate shared chemical 
structures, if possible, and the remainder of the binders. 
The preferred but not limiting method, consensus, 
conf igurational bias, Monte Carlo ["CCBMC"] determination, 

20 step 5, is an efficient smart Monte Carlo method uniquely 
able to incorporate knowledge from prior steps to obtain 

highly a c cu r a t e_phys lcaJL Joinde r st ructu res . From li b rary 

screening, step 2, it is deduced that the binders have a 
shared, actual pharmacophore, structure because they all bind 

25 specifically to the same target molecule (hence, a 

•* consensus" method) . It is not significant to the method if 
the binders come from more than one library as long as they 
all have a structure adaptable to representation in the 
consensus structure determination step (see infra) . From 

30 distance measurements, step 4, a few strategically chosen 
distances are accurately known. This information is 
heuristically utilized along with an accurate model of the 
physical atomic interactions and the allowed molecular 
conformations . 

35 Further, these means are particularly adapted for 

determining structures of molecules having limited 
conformational degrees of freedom at th temperature of 
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interest and conf orroationally constrained by, e.g., internal 
bonds. Potential conformations are generated and selected by 
smart configuration bias techniques which avoid generation of 
unnecessarily improbable new conformations. IHence, a 
5 ♦'configuration bias" method.) The technique is preferably 
applied herein to conf ormationally constrained peptides. A 
concerted rotation technique is combined with conf igurational 
bias conformation generation so that new conformations 
automatically preserve the internally linked backbone 

10 structure constraints. This technique is preferably applied 
to the preferred constrained peptide library, of a sequence 
comprising CXgG (wherein X is any amino acid) . The technique 
is also applicable to other constrained peptide libraries, to 
peptoid libraries, and to any more general organic diversity 

15 libraries that meet certain geometric limitations (i.e., that 
have structures adaptable to representation in the consensus 
structure determination step (see infra) ) . 

The methods of the invention are not limited to the use 
of CCBMC for determining a consensus pharmacophore structure. 

20 Alternative embodiments of this invention may use alternative 
structure determination methods to determine a consensus 

^^phaxmacophot^st^ruG'tu^re-. — F0r-exafRpl^e=aFs±mpte^ 

method is to make exhaustive REDOR NMR measurements 
characterizing the candidate pharmacophore in each binder and 

„ .A5-^J:hen^ j^^^^ A somewhat- less -expensive — 

method is to use a conventional Monte Carlo molecular 
structure determination method to limit somewhat the number 
of REDOR NMR measurements required to characterize the 
candidate pharmacophore. Conventional Monte Carlo methods, 

30 being unable to directly make use of partial distance 
measurements or consensus binding information, are less 
efficient than the CCBMC method and require more distance 
measurements. Further, other known techniques of molecular 
structure determination, for example folding rules or 

35 molecular dynamics, can be used in place of conventional 
Monte Carlo. 
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The success of the structure determination is tested, 
step 6, against various convergence and success criteria. 
Consistency tests, step 6, are applied to the resulting 
structure to determine whether the candidate pharmacophore 
5 previously selected is the actual pharmacophore. One set of 
tests checks predicted distances against new distance 
measurements or against previous measurements temporarily not 
used as structure constraints. A second set of tests checks 
heuristically whether the candidate pharmacophore exhibits 

10 the expected low energy consensus structure. The test are 
described further below. If a shared structure is found, the 
candidate pharmacophore must be the actual pharmacophore. If 
not, another candidate pharmacophore and another shared 
structure is determined, if possible. An actual 

15 pharmacophore exists and will eventually be found and 
accurately structured. 

Upon passing these tests, the methods of the invention 
have provided a consensus structure for the selected 
candidate pharmacophore, preferably accurate to at least 

20 approximately 0.25-0.50 A, as well as structures for the 
remainder of the binder molecules. Lead compound selection, 

step 7_, uses_the 

targeted lead compounds 8. One method of lead selection is 
to design new organic molecules of pharmacologic utility with 

25 the determined pharmacophore struct^^^^ 

selects leads from databases of molecular descriptions. 
Conventionally known to medicinal chemists are databases of 
potential drug compounds indexed by their significant 
chemical and geometric structure (e.g., the Standard Drugs 

30 File (Derwent Publications Ltd., London, England) , the 

Bielstein database (Bielstein Information, Frankfurt, Germany 
or Chicago) , and the Chemical Registry database (CAS, 
Columbus, Ohio)). The determined pharmacophore, being a 
chemical and geometric structure in the preferred embodiment, 

35 is used to query such a database. Search results will be 
those compounds with homologous chemical groups arrayed in a 
very closely similar geometric arrangement. These are lead 
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compounds 8 output from this invention and input to the 
process of drug testing and development. 

Although the preferred identity and ordering of the 
method steps is presented in Fig. 1, the invention is not 
5 limited to this identity and ordering. Other orderings, 
especially of steps 3, 4, and 5, are possible to Achieve 
certain efficiencies. Steps can be inserted and deleted, for 
optimal effect. For example, an additional partial structure 
determination step can be inserted between existing steps 3 

10 and 4 to provide information on how best to make the step 4 
strategic measurements. As another example, in an 
alternative aspect; in lieu of screening one or more 
libraries to select binders, predetermined binders can be 
obtained and used (e.g., binders determined by any means to 

15 be specific to the same target molecule) ; thus, step 2 can be 
omitted. In another embodiment, step 4, the measurement 
step, can be omitted. While all method steps in the 
preferred embodiment assume an aqueous environment at body 
temperature (37 ''O , to the extent these parameters are 

20 relevant to the particular step, the invention is not limited 
to human environmental parameters. 

^5creenrng~agal:iistr"a~dive^ 

selecting by assay those library members which bind 
specifically to the target molecule of interest. Binding 

" - 25 spe c i f i cit y is pref e r ably a. -binding cons tant. of less - than__l 

(micromolar) , and more preferably less than 100 nm 
(nanomolar) . Preferably, an assay is done that detects an 
effect of binding of the binder to the target molecule on the 
target molecule's biological activity, to ensure that the 

30 binding is actually to the biological target of interest* 
Also, preferably, the selected binders are tested to further 
select those binders that bind to the target molecule 
competitively, to ensure that each binds to the same target 
in the target molecule. 

35 The output of the screening step is a number, N, of 

binders select d from one or more libraries for use by the 
subsequent steps of the method. The binders with highest 
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affinity are preferably selected for use by the subsequent 
steps. The chemical structure of each of the N binders 
selected for use is determined as pare of the member 
synthesis and library screening. The primary chemical 
5 structure of the preferred constrained peptide library is 
specified by the amino acid sequence of the -X<- portion of 
the CXfC molecule. For more general organic diversity 
libraries, the selection and arrangement of library building 
blocks in the binders must be determined. 

10 It is a preferred aspect of this invention that the set 

of determined lead compounds is selective and small. Example 
1 illustrates that as pharmacophore distance tolerances are 
relaxed, the number of compounds retrieved by drug database 
searches increases geometrically. As this invention 

15 determines high resolution pharmacophore geometries, it can 
be expected that database searches, or other methods of 
determining leads from pharmacophore structure, will return 
only a few, selective, targeted leads. Methods limiting the 
number of leads decrease the cost of drug development and are 

20 consequently of considerable utility to the pharmaceutical 
industry and medical community. The expense of developing 

z==z=and==evaluatin§"^ead- compon^^ effect and 

medicinal usefulness is well known. Each lead compound must 
be screened for pharmacological usefulness, efficacy, and 

25 saf ety ..Q 

process must be repeated. Finally, the required in vivo 
pharmacologic toxicity and clinical trials alone can consume 
years of time and millions of dollars. 

Therefore, starting with a target molecule 1 having a 

30 biologically or pharmacologically interesting target, the 

method and apparatus of this invention determines a consensus 
pharmacophore structure. This consensus pharmacophore 
structure can then be used to determine a selective set of 
highly specific lead compounds 8 {Fig. 1) for rational design 

35 of drugs, e.g., capable of acting as ligand-mimics (agonists 
or antagonists) for the particular target molecule. 
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In the following discussion and examples, each cf these 
steps will be more fully described. 

5 ♦I- SELECTION OF A TARGET MOLECULE 
5 The target molecule is any one or more molecules 

containing a target or putative target of interest. The 
target is a binding interaction region. The target can be in 
a single molecule or can be a product of a molecular complex. 
The^ target can be a continuous or discontinuous binding 

10 region. The target molecule selected for use (Fig, l, step 
1) is preferably any molecule that is found in vivo 
(preferably in mammals, most preferably in humans) and that 
has biological activity, preferably involved or put^tively 
involved in the onset, progression, or manifestation cf a 

15 disease or disorder. The target molecule can also be a 
fragment or derivative of such an in vivo molecule, or a 
chemical entity that contains the same target as the in vivo 
molecule. Examples of such molecules are well known in the 
art. Such molecules can be of mammalian, human, viral, 

20 bacterial, or fungal origin, or from a pathogen, to give just 
some examples. The target molecule is preferably a protein 
or protein complex ,__^The^^-Xar^^ 

include but are not limited to receptors, ligands for 
receptors, antibodies or portions thereof (e.g., Pab, Fab', 

25 F(ab')2, constant region), proteins or fragrn^^^^ 

nucleic acids, glycoproteins, polysaccharides, antigens, 
epitopes, cells and cellular components^ subcellular 
particles, carbohydrates, enzymes, enzyme substrates, 
oncogenes (e.g., cellular, viral; oncogenes such as ras, raf, 

30 etc.), growth factors (e.g., epidermal growth factor, 

platelet-derived growth factor, fibroblast growth factor) , 
lectins, protein A, protein G, organic compounds, 
organometallic compounds, viruses, prions, viroids, lipids, 
fatty acids, lipopolysaccharides, peptides, cellular 

35 metabolites, steroids, vitamins, amino acids, sugars, 
lipoproteins, cytokines, lymphokines, hormones, T cell 
surface antigens (e,g,, CD4, CDS, T c 11 antigen r ceptor) , 
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ions, organic chemical groups, viral antigens (hepatitis B 
virus surface or core antigens,- HIV antigens {e.g., gpl20, 
gp46)), hepatitis C virus antigens, toxins (e.g,, bacterial 
toxins), cell wall components, platelet antigens {e.g., 
5 gpiibiiia) , cell surface proteins, cell adhesion molecules, 
neurotrophic factors, and neurotrophic factor receptors. 

In specific embodiments, vEGF (vascular endothelial 
growth factor) or KDR (the receptor for vEGF) (Terman et al . , 
1992, Biochem. Biophys. Res. Comm. 187:1579-1586) is the 
10 target molecule. vEGF and its receptor are the major 

regulators of vasculogenesis and angiogenesis (Millauer et 
al., 1993, Cell 72 : 835) . Inhibition of the vEGF and the 
concomitant inhibition of its mitogenic activity and 
angiogenic capacity has been shown to suppress tumor growth 
IS in vivo (Kendall et al., 1993, Proc. Natl. Acad. Sci . USA 
90:10705-10709; Kim et al . , 1993, Nature 362:841-844). Use 
of vEGF or KDR or portions thereof , as a target molecule is a 
preferred embodiment for use of the present invention to 
develop lead molecules as drugs in the area of cardiovascular 
20 disease or cancer. 

The proteins ras and raf, or portions thereof Je.g. , ^ 

modul e srr^^:f:un^^^^^ f e r r e d t a rge t 

molecules, particularly in an embodiment wherein the methods 
of the present invention are employed to develop lead 
25 molecules f or drugs- that- -are- cancer^^ ras is a 

member of an intracellular signaling cascade that controls 
cell growth and differentiation (Cook and McCormick, 1994, 
Nature 369:361-362), ras functions in signal transduction by 
specifically recognizing the protein raf and bringing it to 
30 the cell membrane (Hall, 1994, Science 264:1413-1414; Vojtek 
et al,, 1993; Cell 74:205-214). The recognition modules in 
both ras and raf have been determined (Zhang et al-, 1993, 
Nature 364:308-313; Wame et al., 1993, Nature 364:352-355; 
and Vojtek et al., 1993, Cell 74:205-214); in a specific 
35 embodiment, such a recognition module is used as a target 
molecule according to the invention. 
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In another specific embodiment an imn.^ • • 
target .oUcule. Such .ol.cul.s a« .l^rto f" " " 

invention to d.valcp lead .oUcules for drups in 
5 cardiovascular disorders. °^ 

Target molecules for use r-ar, • 
(Where the tar=o^ • obtained commercially 

(Where the target as commercially available) , or can be 

" .as .een modified . Ji.^:::^:::^:; ^T^i T 

structure that specificallv ^ ' ^ 

cuie. In a preferred aspect, recombinant expression 

methods well known in the a-r c.n K. ^ expression 
15 protein target .olecule as I fusiof ■ " " 

peptide affinity tag Such affrn!" ' a 

-™.«d to epities'f li 

'Bvan et al., isas, „ol . Cell. Biol. 5 3.10 3.»,T ''""^^ 
<e-9.. 5-7, of his residues (which bind !o V ' 
=0 binding sequences such as p^l etc Tags l^^l' "^'"'^ ■ 
into protein targets at eith.. ,h ■ incorporated 

tag (e.g, bioti 



streptavidin, , e.g. . by biotinylation. "''"'''^-> • 

For example, a protein target can be purified bv =^ ^ L 

:mt»^;i:r.:i^:::Lro": -—--^v.. ce„ -ion; 
" Che p-«cauor:rp:o^i: 

Once the target molecule has been purified it i« 
preferably tested to ensure that it r.^^^ - 



24 



wo 96/30849 



K:T/US96/04229 



of a molecule found in vivo, or is a chemical entity 
putatively containing the same target as a molecule found in 
vivo, it is highly preferred that testing be done of such 
desired target molecules prior to their use, so that among 
5 such desired target molecules, only those that have the same 
biological activity as the in vivo molecule or compete with a 
known ligand to the in vivo molecule, are selected for actual 
use as target molecules according to the invention. In the 
event that biological activity has been reduced or lost in a 

10 recombinant protein relative to the native form of the 
protein, the protein can be recombinant ly expressed in a 
different host (e.g., yeast, mammalian, or insect) and/or 
with a variety of tags and location of tags (on either the 
amino- or carboxy- terminal side) , in order to attempt to 

15 achieve, or to optimize, recovery of biological activity. 



According to a preferred embodiment of the invention, 
diversity libraries are screened to select binders, which 
20 specifically bind to the target molecule. Diversity 
libraries are those containing a plurality of different 
members. Generally, the greater the nurti:>ex.ot^^i^^5diTy^^' 



meinber-s^^^nd-H^fie^gfeflrte^^^^^ that all possible 

members are represented, the more preferred the library. In 

25 preferred embodiments, the diversity libraries have at least 

10* members 10*, 10* , 10^**, or 

10**, members. 

Many libraries suitable for use are known in the art and 
can be used. Alternatively, libraries can be constructed 
30 using standard methods. Chemical (synthetic) libraries, 

recombinant expression libraries, or polysome -based libraries 
are exemplary types of libraries that can be used. 

In a preferred embodiment, the library screened is a 
constrained, or semirigid library (having some degree of 
35 structural rigidity) . Examples of constrained libraries are 
described below. A linear, or nonconstrained library, is 



5.2. 



DIVERSITY LIBRARIES 
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libraries," that contain oligonucleotide identifiers for each 
chemical polymer library member. 

In another embodiment, biological random peptide 
libraries are used to identify a binder which binds to a 
5 target molecule of choice. Many suitable biological random 
peptide libraries are known in the art and can be used or can 
be constructed and used to screen for a binder that binds to 
a target molecule, according to standard methods commoniy 
known in the art . 
10 According to this approach, involving recombinant DNA 

techniques, peptides are expressed in biological systems . as- 
either soluble fusion proteins or viral capsid fusion 
proteins. 

In a specific embodiment, a phage display library, in 

15 which the protein of interest is expressed as a fusion 
protein on the surface of a bacteriophage, is used (see, 
e.g.. Smith, 1985, Science 228:1315-1317). A number of 
peptide libraries according to this approach have used the 
K13 phage. Although the N-terminus of the viral capsid 

20 protein, protein III (PlII), has been shown to be necessary 

for viral infection, the extreme N-t^rmijius^^ 

-:z::^!^I^i^-<i<2«s~^^i^^^^^s^ insertions. The 
protein PVIII is a major M13 viral capsid protein, which can 
also serve as a site for expressing peptides on the surface 

25 of M13 viral _parxiclesT- inr the cor^ of phage display 

libraries. Other phage such as lambda have been shown also 
to be able to display peptides or proteins on their surface 
and allow selection; these vectors may also be suitable for 
use in production of libraries (Sternberg and Hoess, 1995, 

30 Proc. Natl, Acad. Sci, USA 92;1€09-1613) . 

Various random peptide libraries, in which the diverse 
peptides are expressed as phage fusion proteins, are known in 
the art and can be used. Examples of such libraries are 
described below. 

35 Scott and Smith, 1990, Science 249:386-390 describe 

construction and expression of a library of hexapcptid s on 
the surface of M13. The library was made by inserting a 33 
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base pair Bgl I digested oligonucleotide sequence into an Sfi 
I digested phage fd-tet, i-e., fUSES RF. The 33 base pair 
fragment contains a random or "degenerate" coding sequence 
(NNK)g where N represents G, A, T or C and K represents G or 
5 T. Cwirla et al., 1990, Proc. Natl. Acad. Sci . USA 87: 6378- 
63 82 also described a library of hexapeptides expressed as 
pill gene fusions of M13 fd phage. PCT publication WO 
91/19818 dated December 26, 1991 by Dower and Cwirla 
describes a library of pentameric to octameric random amino 
10 acid sequences. 

Devlin et al., 1990, Science, 249:404-406, describes a 
peptide library of about 15 residues generated ^u^ an (NNS) 
coding scheme for oligonucleotide synthesis in which S is G 
or C. 

15 Christian and colleagues have described a phage display 

library, expressing decapeptides (Christian, R.B., et al . , 
1992, J. Mol. Biol. 227:711-718). The DNA of the library was 
constructed by use of an oligonucleotide comprising the 
degenerate codons lNN(G/T))io (SEQ ID NO: 8) with a self- 

20 complementary 3' terminus. This sequence forms a hairpin 
which creates a self -priming replication site that was used 

__byL_Tji„_DNA_poiyme:rase-^E^-^ene^ 

The double- stranded DNA was cleaved at the Sfil sites at the 
5 ' terminus and hairpin for cloning into the f USES vector 

2 5 descrj.bed by_ 

Lenstra, 1992, J. Immunol. Meth. 152:149-157 describes a 
library that was constructed by annealing oligonucleotides of 
about 17 or 23 degenerate bases with an 8 nucleotide long 
palindromic sequence at their 3* ends. This resulted in the 

30 expression of random hexa* or octa-peptides as fusion 
proteins with the )3-galactosidase protein in a bacterial 
expression vector. The DNA was then converted into a double- 
stranded form with Klenow DNA polymerase, bl\mt-end ligated 
into a vector, and then released as Hind III fragments. 

35 These fragments were then cloned into an expression vector at 
the sequence encoding the C-terminus of a truncated 
0-galactosidase to gen rate 10"' recombinants. 
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Kay ec al., 1993, Gene 128:59-65 describes a random 38 
amino acid peptide phage display library. 

PCT Publication No. WO 94/18318 dated August 18, 1994 
describes random peptide phage display '*TSAR libraries" that 
5 can be used. 

Other biological peptide libraries which can be used 
include those described in U.S. Patent No. 5,270,170 dated 
December 14, 1993 and PCT Publication No. WO 91/19818 dated 
December 26, 1991, 

10 In a specific embodiment, a "peptide-on-plasmid" 

library, containing random peptides fused to a DNA binding 

protein that links- the peptides to the plasmids encoding 

them, can be used (Cull et al . , 1992, Proc, Natl. Acad. Sci . 
USA 89:1865-1869) , 

15 Another alternative to phage display or chemically 

synthesized libraries is a polysome-based library, which is 
based on the direct in vitro expression of the peptides of 
interest by an in vitro trahslation system (in some 
instances, coupled to an in vitro transcription system) . 

20 These methods rely on polysomes to translate the genomic 
information (in this case encoded by an mRNA molecule, in 

__::S©me_-isst:ancesr4nade-^ill^ 

DNA) (see, e.g., Korman et al., 1982, Proc. Natl. Acad. Sci. 
USA 79:1844-1848). Such in vitro translation-based libraries 

25 include but are^n^^^^^^^ 

Publication No. WO 91/05058 dated April 18, 1991; and 
Mattheakis et al., 1994, Proc. Natl. Acad. Sci. USA 
91:9022-9026. 

Diversity library screening, step 2 of Pig. 1, 

30 determines a few, N, members (compounds) from one or more 
libra|*ie8 and their primary sequences all of which 
specifically bind to target molecule 1 in a similar manner. 
A structured organic diversity library is a prescription for 
the creation of a huge number of related molecules all built 

35 from combinations of a small number of chemical building 
blocks. Preferred dlv rsity libraries for use according to 
the invention have m mbers whose binding to a targ t moleculi 
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is characterized by conf igurational entropy change that are 
relatively small to the binding energy. This means that 
library members have definite structures in the bound and, 
especially, the unbound states. A preferred example of a 
5 chemical diversity library for use in the invention contains 
short peptides with a constrained conformation. Short 
peptides without constrained conf omat ions are often freely 
flexible in an aqueous environment and adopt no fixed unbound 
structure. The binding of such library members is 

10 complicated by significant conf igurational entropy changes. 
To eliminate this complication, it is preferred that all 
library members have a constrained structure and bind to the 
target molecule in a specific and identifiable manner. One 
method of achieving constrained conformation is to require 

15 internal linking, such as by disulfide bonds. 

In one embodiment, disulfide bond formation is achieved 
by use of libraries that contain peptides having a pair of 
invariant cysteine residues, preferably positioned in the 
range of 2-16 residues apart, most preferably 6-8 residues 

20 apart, that cross-link in an oxidizing environment to form 
cystines (disulfide bonds between cysteines) . An example of 
====^s^h^^ibr ar-ies- ar^^^tose^ 

the form R*CX„CR^ wherein is a sequence of 0-10 amino acids, 
C is cysteine, is a sequence of n variant amino acids 

- 25--.( e^g^ ,. -if ^ all^ 20-^la^sical.-aminQ .acids^ are .represented ,. _JC 

means any one of the 20 classical amino acids) ; n is an 
integer ranging from 2 to 16; and R^ is a sequence of 0-10 
amino acids. and R^ can contain invariant or variant amino 
acids. Another example is such libraries are those 

30 containing or expressing peptides of the form R^CX„R*, where 
R\ X, n, and R^ are as described above; n is preferably 8 or 
9. A preferred constrained peptide library, of at least 10* 
members, consists of peptides comprising the sequence CX«C 
(SEQ ID N0:1), wherein C is cysteine, X is any naturally 

35 occurring amino acid, and a disulfide bond is formed between 
the two cysteines. Additional invariant amino acids { .y., 
preferably no more than 5-10 amino acids) on either the 
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amino- or carboxy-terminus of CX^C can be incorporated as part 
of the peptide in this preferred embodiment. Fig. 10 
schematically illustrates such a molecule. The disulfide 
bridge between the two cysteines acts as a sufficient 
5 conformational constraint for the preferred practice of this 
invention. By way of example, the library is constructed by 
generating oligonucleotides with the desired degeneracy to 
code for the peptides and ligating them into vectors of 
- -choice." These inserted oligonucleotides are suitable for 

10 both use in in vivo genetic expression systems exemplified by 
phage display, or in vitro translation methods based on 
coupled transcription and translation^f rom DNA of interest- 
(see below) . The creation and use of an exemplary library is 
described in Section 6.3 hereinbelow. The invention is 

15 easily and readily adaptable to other alternative peptide 
libraries which include short peptides with alternative 
disulfide scaffolding, for example, comprising the sequence 
CXnCX^CC with two disulfide bridges, wherein n and m are each , 
independently an integer in the range of 2-10, and X is any 

20 amino acid. More generally, any peptide library containing 
members of definite conformation which bind to a target 
_„__^ ^olje^i^ jja=a^ flbT e manner -mayzfagrjgs^ed^— 

Further, more general, structurally constrained, organic 
diversity (e.g., nonpeptide) libraries, can also be used. By 

25 way of example, a benzodiazepj-n^^^^^^ 

al., 1994, Proc. Natl. Acad. Sci . USA 91:4708-4712) maybe 
adapted for use. 

Constrained libraries that can be used are also known in 
the art. For example, PCT Publication No. WO 94/18318 dated 

30 August 18, 1994 describes semirigid phage display libraries, 
in which the plurality' of expressed peptides can adopt only a 
single or a small number of conformations. Examples of such 
libraries have a pair of invariant cysteine residues 
positioned in or flanking random residues which, when 

35 expressed in an oxidizing environment, are most likely cross- 
link d by disulfid binds to form cyetin s. Also disclosed 
are librari s having a cloverleaf structure by appropriate 
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arrang ment of cysteine residues. Also disclosed are 
libraries with peptides having invariant cysteine and 
histidine residues positioned within the random residues, or 
invariant histidines alone within the random residues. 
5 TSAR- 13 and TSAR- 14 are exemplary semirigid libraries 
disclosed therein. 

Other conf ormationally constrained libraries that can be 
used include but are not limited to those containing modified 
peptides (e.g., incorporating fluorine, metals, isotopic 

10 labels, are phosphorylated, etc.), peptides containing one or 
more non-naturally occurring amino acids, non-peptide 
structures, and peptides containing a significant fraction of 
Y-carboxyglutamic acid. 

As stated above, libraries of non-peptides , e.g., 

15 peptide derivatives (for example, that contain one or more 
non-naturally occurring amino acids) can also be used. One 
example of these are peptoid libraries (Simon et al . , 1992, 
Proc- Natl. Acad. Sci. USA 89:9367-9371). Peptoids are 
polymers of non- natural amino acids that have naturally 

20 occurring side chains attached not to the alpha carbon but to 
the bac)cbone amino nitrogen. Since peptoids are not easily 
L^=rdegrade4- by-imman di^stive^n^ymesr^^ 

more easily adaptable to drug use. Another example of a 
library that can be used, in which the amide functionalities 

25__in„ peptides- 

transformed combinatorial library, is described by Ostresh et 
al., 1994, Proc. Natl- Acad, Sci. USA 91:11138-11142). 

The peptide or peptide portions of members of the 
libraries that can be screened according to the invention are 

30 not limited to containing the 20 naturally occurring amino 
acids. In particular, chemically synthesized libraries and 
polysome based libraries allow the use of amino acids in 
addition to the 20 naturally occurring amino acids (by their 
inclusion in the precursor pool of amino acids used in 

35 library production) . In specific embodiments, the library 
memb rs contain one or more non^natural or non- classical 
amino acids or cyclic peptides. Non-classical amino acids 
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include but are not limited to the D- isomers of the common 
amino acids, a-amino isobutyric acid, 4 -aminobutyric acid, 
Abu, 2-aTnino butyric acid; -y-Abu, €-Ahx, 6-amino hexanoic 
acid; Aib, 2-amino isobutyric acid; 3-amino propionic acid; 
5 ornithine; norleucine; norvaline, hydroxyproline , sarcosine, 
citrulline, cysteic acid, t-butylglycine, t-butylalanine, 
phenylglycine, cyclohexylalanine, E-alanine, designer amino 
acids such as S-mechyl amino acids, Cor-methyl amino acids, 
Ncy-methyl amino acids, fluoro-amino acids and amino acid 

10 analogs in general* Furthermore, the amino acid can be D 

(dextrorotary) or L (levorotary) . 

By way of example, the incorporation of non-standard or 

modified amino acids into libraries can be done by taking 
advantage of concurrent development in reassigning the 

15 genetic code {Noren et al., 1989, Science 244:182-188; 

Benner, 1994, Trend. BioTech. 12:158-163) and the charging of 
specific tRNAs with the desired amino-acid (Cornish et al . , 
1994, Proc. Natl. Acad- Sci. USA 91:2910-2914). See also 
Ibba and Hennecke, 1994,, Bio/Technology 12:678-682 

20 (particularly Table I), and references cited therein. These 
pre-charged tRNAs are then^ utili2ed_in_j:h^^ 
~~'^XT^Txi^aZ2^&r^ to incorporate the non-standard amino acid 

into the library of choice. The position of incorporation 

can be either random (variant) or defined (invarj^an^K 

-25 <ief ined -case- can'be cho^^^ the utility of the 

resulting placement of the non-natural functional group to 
maximize either binding properties or the ability to perform 
structural measurements. Similar techniques may be used to 
incorporate non-standard amino acids into the peptides. 

30 In a specific embodiment, an iterative approach to 

library construction can be taken, as structural information 
on the mode of binding to a given target is obtained. For 
example, information from structural analysis can be used to 
make libraries with library members containing chemical 

35 backbones that match known chemical scaffolds, enhance 

solubility or membrane perm ability, r duce effect of water 
on structure, and incorporate other physical parameters 
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suggested by structural analysis. Use of algorithmically 
optimized library inserts can be used to increase the chances 
of finding binders of interest {see e.g., Arkin and Youvan, 
1992, Bio/Technology 10:297-300). 
5 In other embodiments, the following can be used to 

improve library use in both phage and bacterial systems: 
production of libraries in bacteria which overproduce the 
chaperonins GroES and GroEL (Soderlind et al . , 1993, 
Bio/Technology 11:503-507), and production in £vcolist rains~ 

10 which prevent degradation in the periplasmic space (Strauch 
and Beckwith, 1988, Proc. Natl. Acad. Sci. USA 85:1576-1580; 

Lipinska et al . , 1989, J. Bacteriology ,171 : 1574 -158.4 ) ... . _ 

Purified cofactors such as GroES and GroEL could also be 
directly added to an in vitro expression and selection 

15 system. 

5,3 . SCREENING OF DIVERSITY LIBRARIES 
Once a suitable diversity library has been constructed 
(or otherwise obtained) , the library is screened to identify 

20 binders having binding affinity for the target. Screening is 
done by contacting the diversity library members with the 

_ targeJL^ol ecule under c onditioas--conduciA J^€— ^o-te 4nding and 

then identifying the member (s) which bind to the target 
molecule. Screening the libraries can be accomplished by any 

25 of a variety of commonly known methods. See, e.g., the 
following references, which disclose screening of peptide 
libraries: Parmley and Smith, 1989, Adv. Exp. Med, Biol, 
251:215-218; Scott and Smith, 1990, Science 249:366-390; 
Fowlkes et al., 1992; BioTechniques 13:422-427; Oldenburg et 

30 al., 1992, Proc. Natl. Acad. Sci. USA 89:5393-5397; Yu et 
al., 1994, Cell 76:933-945; Staudt et al . , 1988, Science 
241:577-580; Bock et al., 1992, Nature 355:564-566; Tuerk et 
al., 1992, Proc. Natl. Acad. Sci. USA 89:6988-6992; Ellington 
et al., 1992, Nature 355:850-852; U.S. Patent No. 5,096,815, 

35 U.S. Patent No. 5,223,409, and U.S. Patent No. 5,198,346, all 
to Ladner et al.; Rebar and Pabo, 1993, Science 263:671-673; 
and PCT Publication No. WO 94/18318, S e also the references 
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cited in section 5.2 hereinabove (disclosing Ubrarie.. 
regarding methods for screening. 

screening can be carried out by contacting the library 
members w.th an immobilized target molecule and harvesting 
5 those Ubrary members that bind to the target. Examples ot 
^uch screening methods, termed -panning- techniques are 

73.305-318; Powlkes et al . . 1S92, BioTechniques 13-422-427. 
PCT Publication No. HO 54/18318; and in references 'cited- ^' 

the Ubrar.es, the target molecule can be immobilized on 
.Plates, beads, such as magneticbeadsv sepharose; *tc. . or on 
beads used xn columns, m particular embodiments, the 
immobilized target molecule has incorporated an "affinity 

15 tag, as described above whirh r-ar, ^ 

au^jv^, wnicn can be used to effect 

r::"::i\Tp:L""""' -^-^ ^^-^-^ — - 

librarierLT"'""' '"'^'^ """"^ °' "''"-^ 
20 to ammobUize the target molecule prior to its use L the 
s^ection (screening, process. This method can be i^roved 
upon, to in crease thjo«9 hp „ f .-i r„ iv i t y-,l.. 



solid phase plastic supports can be replaced with magnetic 
25 ZTll ""•^-^"■^ l«g. beads can Z used 

hindrance, for use in bacterial systems. This steric 
hindrance can be avoided by using high gradient magnetic cell 

p ration with small particles ,«0.5^, (Miltenyre al 
1"0, Cytometry 11:231-238) . 

" Bhao.'^' involving the use of a peptide 

Phage display library, selection of a binder protein 
expressed on the surface of a bacteriophage thus .elects both 
^Uhi:t: 7"'" being 

a sMid "■'^i^rary members, phage are relea. d from 

is ri^b'Tr °" ""^ ^i»aer.t.rget molecule co 

is immobilxzed. and are amplified, e.g.. by infecting / Zu 
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and propagating each isolated binding phage. Repeating this 
process of affinity capture and amplification allows those 
peptides which bind with the highest affinity to the target 
molecule to be selectively enriched from the original^ 
5 library. 

In one particular embodiment, presented by way of 
example but not limitation, a phage display library can be 
screened as follows using magnetic beads (see PCT Publication 
No. WO 94/18318) : 
10 Target molecules are conjugated to magnetic 

beads, according to the instructions of the 
manufacturers . The beads are incubated with excess 
bovine serum albumin (BSA) , to block non-specific 
binding. The beads are then washed with numerous 
15 cycles of suspension in phosphate buffered saline 

(PBS) with 0.05% Tween^ 20 and recovered by drawing 
a strong magnet along the sides of a plastic tube. 
The beads are then stored under refrigeration, 
until use. 

20 An aliquot of a library is mixed with a sample 

of resuspended beads, at 4**C for a. time period in 

the-=rangi==af^-2^:2^^ 

recovered with a strong magnet and the liquid is 
removed by aspiration. The beads are then washed 

„._.25 by resuspen&ion in P&S withr-Ov05V -Tween*-2<^/^--a^ 

then drawing the beads to the tube wall with the 
magnet- The contents of the tube are removed and 
washing is repeated 5-10 additional times. 50 mM 
glycine-HCl (pH 2-0), 100 ng/ml BSA solution is 

30 added to the washed beads to denature proteins and 

release bound phage. After a short incubation, the 
beads are drawn to the side of the tubes with a 
strong magnet, and the liquid contents are then 
transferred to clean tubes. 1 M Tris-HCl (pH 7,5) 

35 or 1 M NaHjPO^ (pH 7) is added to the tubes to 

neutralize the pH of th phage sample. The phage 
are then dilut d, e.g., 10*' to lO**, and aliquots 
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plat d with E. coli DHSorF' cells to determine the 
number of plaque forming units of the sample. In 
certain cases, the platings are done in the 
presence of XGal and IPTG for color discrimination 
5 of plaques [i.e., lacZ+ plaques are blue, iacZ- 

plaques are white) . The titer of the input samples 
is also determined for comparison. 

Alternatively, as yet another non-limiting example, 
screening a diversity library of- phage expressing peptides 

10 can be achieved by panning using microtiter plates (see PCT 
Publication No. WO 94/16318) as follows: 

The target molecule is diluted and a small 

aliquot of target molecule solution is adsorbed 
onto wells of microtiter plates (e.g. by incubation 

15 overnight at 4=*C) , An aliquot of BSA solution (1 

mg/ml, in 100 mM NaHCO,, pH 8.5) is added and the 
plate incubated at room temperature for 1 hr. The 
contents of the microtiter plate are flicked out 
and the wells washed carefully with PBS-0.05% 

20 Tween® 20. The plates are repeatedly washed free 

of unbound target molecules. A small aliquot of 
phage solution is intrpdjaced^an^ 
wells are incubated at room temperature for 2-24 
hrs. The contents of microtiter plates are flicked 

25 out and washed repeatedly. The plates are 

incubated with wash solution in each well for 20 
minutes at room temperature to allow bound phage 
with rapid dissociation constants to be released* 
The wells are then washed five more times to remove 

30 all unbound phage. 

To recover the phage bound to the wells, a pH 
change is used. An aliquot of 50 mM glycine-HCl 
(pH 2.0), 100 ;ig/ml BSA solution is added to the 
washed wells to denature proteins and release bound 

35 phage. After 10 minutes at 65»C, the contents are 

then transferred into clean tubes, and a small 
aliquot of 1 M Tris-HCl (pH 7.5) or IM NaHaPO^ (pH 
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7) is added to neutralize the pH of the phage 
sample. The phage are then diluted, e.g., 10'* to 
10'^ and aliquots plated with E. coli DHBorF' cells 
to determine the number of the plaque forming units 
5 of the sample. In certain cases, the platings are 

done in the presence of XGal and IPTG for color 
discrimination of plaques (i.e., IacZ+ plaques are 
blue, lacZ- plaques are white) . The titer of the 
input_ samples is also determined for comparison 
10 (dilutions are generally 10"* to 10'*) . 

By way of another example, diversity libraries 
expressing peptides as a surface protein of either a particle 
or a host cell, e.g., phage or bacterial cell, can be 
screened by passing a solution of the library over a column 
15 of the target molecule immobilized to a solid matrix, such as 
sepharose, silica, etc., and recovering those particles or 
host cells that bind to the column after washing and elution. 

In yet another embodiment, screening a library can be 
performed by using a method comprising a first "enrichment" 
20 step and a second filter lift step as described in PCT 
Publication No- WO 94/18318. 

Several rounds of serial screening are preferabl y, 

conducted. In a particularly preferred aspect, each round is 
varied slightly, e.g., by changing the solid phase on which 
25 immobilization occurs, or by changing the method of 

immobilization oh r^^^ by changing the linker to) the solid 
phase. When using a phage display library, the recovered 
cells are then preferably plated at a low density to yield 
isolated colonies for individual analysis. By way of 
30 example, the following is done: The individual colonies are 
selected, grown and used to inoculate LB culture medium 
containing ampicillin. After overnight culture at 37*C, the 
cultures are then spun down by centrifugation. Individual 
cell aliquots are then rates ted for binding to the target 
35 molecule attached to the beads. Binding to other beads, 

having attached thereto a non- relevant molecule, can be used 
as a negativ control. 
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In a specific embodiment, different rounds of screening 
can respectively involve selection against targets in 
primarily their purified form, and then in their natural 
state (e.g., on the surface of a mammalian cell) {see, e.g., 
5 Marks et al., 1993, Bio/Technology 11:1145-1149, describing 
selection against cell surface blood group antigens) . 

In other examples, subsequent rounds of screening can 
involve immobilization of the target molecule by attachment 
at different ends {e.g., amino or carboxy- terminus) of the 

10 target molecule to a solid support, or presentation of 

library members by attachment to or fusion at different ends 
of the library members". ' 

By way of other examples of screening methods that can 
be used, genetic selection methods can be adapted for 

15 screening of libraries, or can be used in a recursive scheme. 
Thus, in a specific aspect, the invention provides screening 
methods in which methods allowing high throughput and 
diversity screening {e.g., screening phage display or 
polysome libraries against a ligand) are utilized in initial 

20 rounds, with siibsequent rounds employing a genetic selection 
technique, in which the presence of a binder of apprppriate_ 
:zir=zrS^ecifici^y'inereases~l:iT^ activation of a 

transcriptional promoter or origin of replication. Genetic 
selection techniques that can be adapted for use (e.g., by 

2 5 ins er t ingir andom ol igonueleot^i s "in" t he^ t es t pi a smid ) 
include the two-hybrid system for selecting interacting 
proteins in yeast, replicative based systems in mammalian 
cells, and others (see, e.g.. Fields & Song, 1989, Nature 
340:246-246; Chien et al,, 1991, Proc. Natl. Acad. Sci. USA 

30 88:9578-9582; Vasavada et al-, 1991, Proc. Natl- Acad. Sci. 
USA 88:10686-10690). Thus, in a specific embodiment, 
compounds are produced as fusion proteins, and contacted with 
a different fusion protein coR?)rising a target fused to 
another molecule, in which specific binding of the fusion 

35 proteins to each other results in an increase in activity or 
activation of a transcriptional promoter or an origin of 
replication. In a specific embodiment, a genetic selection 
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method is used in a later round of screening to either select 
directly for a library member that binds to a target 
molecule, or to select a library member that competitively 
inhibits binding of a ligand to the target molecule. 
5 Several exemplary methods for screening a phage/phagemid 

library are presented by way of example in Section 6,4 
hereinbelow. An exemplary method for screening a polysome- 
based library is presented in Section 6.3.3 hereinbelow. 

Once binders are selected from a diversity library which 

10 bind to a target molecule of interest, additional assays are 
preferably, although optionally, performed, including but not 
limited to those described below. Thus, in vivo ox in vitro 
assays can be performed to test whether binding of a binder 
to the target molecule affects the target molecule's 

15 biological activity; binders that exert such an effect are 
preferred for use in subsequent steps of the invention. 
Alternatively, or in addition, competitive binding assays can 
be carried out to test whether the binder competes with other 
binders or with a natural ligand of the target molecule, for 

20 binding to the target molecule; binders that compete with 
each other, and that compete with the natural ligand, are 

prefer ably^-.selected-foi^^se==i^^^ 

invention- Alternatively, or in addition to the above 
assays, the binding affinity of binders for the target 

25 mole cu^^^^ 

example, as described in Section 6.5 infra. Binders of the 
highest affinity are preferred for use in subsequent steps of 
the invention. 

30 5.4. DETERMINING THE SEQUENCE OR 

CHEMICAL FORMULA OF BINDERS 

Many of the references cited in Section 5,2 and 5,3 

hereinabove, which disclose library construction and/or 

screening, also disclose methods that can be used to 

2g determine the sequence or chemical formula of binders 

isolated from such libraries. By way of exampl , a nucleic 

acid which xpresses a binder can be identifi d and recovered 
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from a peptide expression library or from a polysome -based 
library, and then sequenced to determine its nucleotide 
sequence and hence the deduced amino acid sequence that 
mediates binding. (In an instance wherein the sequence of an 
5 RNA is desired, cDNA is preferably made and sequenced,) 
Alternatively, the amino acid sequence of a binder can be 
determined by direct determination of the amino acid sequence 
of a peptide selected from a peptide library containing 
chemically synthesized peptides. In a less preferred aspect, 
10 direct amino acid sequencing of a binder selected from a 
peptide expression library can also be performed. 

Nucleotide sequence analysis can be carried out by any 
method known in the art, including but not limited to the 
method of Maxam and Gilbert (1980, Meth. Enzymol , 65:499- 
15 560), the Sanger dideoxy method (Sanger et al , , 1977, Proc. 
Natl, Acad. Sci. U.S.A. 74:5463), the use of T7 DNA 
polymerase (Tabor and Richardson, U.S. Patent No. 4,795,699; 
Sequenase'*, U.S. Biochemical Corp.), or Taq polymerase, or 
use of an automated DNA sequenator (e.g.. Applied Biosystems, 
20 Foster City, CA) . 

Direct determination of the chemical formulas of non- 
peptide or peptide binders can be carried o ut by met hods well 

k^iow^-iir^tifre~^ limited to mass 

spectrometry, NMR, infrared analysis, etc. 
25 In preferred aspects involving certain types of 

libraries^ we 11 known in "the art ,"~seq^^ use of 

known analytic techniques for chemical formula determination 
will not be necessary. In some such libraries, the identity 
and composition of each member of the library is uniquely 
30 specified by a label or "tag" which is physically associated 
with it and hence the compositions of those members that bind 
to a given target are specified directly (see, e.g., Ohlmeyer 
et al., 1993, Proc. Natl. Acad, Sci. USA 90:10922-10926; 
Brenner et al., 1992, Proc. Natl. Acad. Sci. USA 
35 89:5381-5383; Lemer et al., PCT Publication No. 

WO 93/20242). In other examples of such libraries, the 
library members are creat d by step wise synthesis protocols 
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accompanied by complex record keeping, complex mixtures are 
scr ened, and deconvolution methods are used to elucidate 
which individual members were in the sets that had binding 
activity, and hence which synthesis steps produced the 
5 members and the composition of individual members {see, e.g., 
Erb et al., 1994, Proc, Natl. Acad. Sci. USA 91:11422-11426). 

Step 2 of the invention provides as output N binding 
library members (binders) and their sequences or chemical 
formulas . 

10 

5.5. CAKPIDATE PHARMACOPHORE SELECTION 
The prior diversity library screening, step 2, 
determines a set of size N of specifically binding members 
from one or more diversity libraries. While the binders are 

15 preferably but not necessarily isolated from one or more 

diversity libraries (e.gr., binders need not be isolated from 
diversity libraries; known binders can be simply provided), 
the following description shall refer to the preferred 
embodiment wherein diversity library members are the binders. 

20 It will be apparent that the description is also readily 
applicable to binders that are not isolated from diversity 

~ -^^tib r a r ie^s .— — ~~ '"'^^ r 7 

The pharmacophore responsible for the library member 
binding is preferably determined by an overall select and 

25 test- metJiod -i^ this and-^ubse<}uent steps . — I^i^ general ,- a 

pharmacophore is specified by the precise electronic 
properties on the surface of the binder that causes binding 
to the surface of the target molecule. In the preferred 
embodiment, these properties are specified by the underlying, 

30 causative, chemical structures. Chemical structures are 
specified generally by groups such as -CHj-, -COOK, and 
-CONHj- The preferred pharmacophore representation consists 
of a specification of the underlying chemical groups and 
their geometric relations. The more precisely the geometric 

35 relations are specified, the more preferred. In preferred 
but not limiting aspects, the geometric relations ar precise 
to at least 0.50 k, and most pref rably, at least 0.25 A. A 
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pharmacophore will usually comprise 2 to 4 of such groups, 
with 3 being typical. However, for complex protein 
recognition targets, a pharmacophore may comprise a greater 
number of groups. For example, it is possibl that the 
5 entire 6 amino acid sequence, -X^-, may be needed for a member 
of the preferred CX^C library to bind to complex targets, in 
which case the pharmacophore includes the entire binder. 

Considering by way of example, the case of binders 
isolated from the preferred library, of sequence CX^C, the - 

10 chemical groups defining a peptide pharmacophore are terminal 
groups on amino acid side chains. Typically, therefore, a 
sequence of two to four contiguous amino acids will contain 
the pharmacophore of interest. For example. Fig. 11 
illustrates an Arginine-Glycine-Aspartate sequence forming a 

15 well known platelet aggregation inhibiting pharmacophore, 
which is defined by the positions and orientations of the 
adjacent -CorHj-, and -COOH groups. Pharmacophores 

formed by discontiguous amino acids are not likely to occur 
in the preferred library due to the conformational constraint 

20 on the short peptide imposed by the disulfide bridge. 

The selection step determines candidate amino acid 
sequences in each binder that define a candidate. — 
jphajnnaj^jDphox^ , t ermina 1 groups . 

Candidate selection depends substantially only on the 

25 chemical structures of the amino acid side chains and 
terminal groups ionly very rarely o^^ , 
Geometric structure is not yet available and cannot be used 
for candidate selection. In the preferred embodiment, amino 
acids are grouped into homologous groups defined by group 

30 members having similar side chain structure and activity (see 
infra) . Candidate pharmacophores are found by searching the 
sequences of the N binders for short sequences of homologous 
amino acids. This search will produce at least one 
candidate, because all the binders share the actual 

35 pharmacophore. Several candidates will usually be found 
since geometric information is ignored, and the search is 
thereby underdetermined. 
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Fig. 2A illustrates an exemplary method of performing 
the search for homologous secfuences. Although this method is 
illustrated as searching for homologous contiguous sequences 
of length 3, it is easily adaptable to search for homologies 
5 of other lengths and also for discontiguous homologous 

sequences. If no candidate pharmacophores of length 3 have a 
consistent consensus structure, then pharmacophores of length 
2, 4, or longer or discontiguous sequences must be searched 
and selected for test.. For some complex targets, the 
10 pharmacophore may include the entire variable part of the 

library member. The exemplary method is a simple depth- first 
search for matching amino acid strings. More sophisticated 
string search methods are known and are equally applicable to 
this invention, 

15 The method begins with the administrative steps 201 and 

202 of labeling the binders with integers from 1 to N and 
assigning the string variable 'ABC to the next left most 
sequence of three amino acids to test in binder 1. If this 
is the first candidate selection, 'ABC will be at the left 

20 most position in binder 1. If prior candidates have been 
selected, 'ABC will be assigned one amino acid to the right 
of its p rior assignment. The FOR loop, formed by steps 203, 
206, and 207, then selects each binder from 2 to N for ^ ~ 
scanning for a sequence homologous to 'ABC . Step 203 does 

25 loop administration. Step 206 does the scanning. If 

hombrogous ^e found 7 test 207 loops back^^^t scan 

the next binder. If homologous sequences have been found in 
all binders from 2 to N, the loop exits at step 204* In this 
case 'ABC is a string in binder 1 which is homologous to 

30 other strings in all remaining binders and is thus a 

candidate pharmacophore. The method exits at 205 for this 
candidate to be structured and tested for whether it is the 
actual pharmacophore. If a binder does not have a sequence 
homologous to 'ABC, then this string is not a candidate. In 

35 this case, test 206 determines if 'ABC is at the right end 
cf binder 1. If so, there are no more homologies to t st for 
and the method exits at 209. If not, then 'ABC is advanced 
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one amino acid to the right 210 and the scan of all binders 
is repeated beginning at 203. 

Fig, 2B illustrates how string variable 'ABC is scanned 
across binder 1, represented schematically by 220. First, 
5 'ABC is assigned to XiX2X3 at 221, then to XjX^X^ at 222, to 
XjX^Xs at 223, and finally to X.X^Xg at 224, 

Given an assignment to 'ABC, step 206 scans each other 
binder, for example binder K with K>1, for homologous 
sequences. This is simply done by comparing all contiguous 

10 substrings of binder K with 'ABC to determine if they are 
homologous. They are homologous if corresponding amino acids 
in the substring and 'ABC are homologous. In turn, two 
amino acids are homologous if they satisfy established 
homology rules. Each homologous sequence found in binder K 

15 defines a separate candidate pharmacophore, if sequences 
homologous to 'ABC are found in all other binders. 

In a case where discontiguous homologous sequences are 
sought, 'ABC is assigned to amino acids in discontiguous 
positions in binder 1 and then compared for homologies to 

20 amino acids in the same relative positions throughout the 
other binders. 

Various rules of amino acid homology may be used in this 
invention. In the preferred embodiment, amino acids are 
homologous if they are found in the same class of amino 

25 acids, based on side chain activity (see Lehninger, 

Principles of Biochemistry . (1982), chap. 5). Preferred 
homologous groups of amino acids are as follows. The 
nonpolar (hydrophobic) amino acids include alanine, leucine, 
isoleucine, valine, proline, phenylalanine, tryptophan and 

30 methionine. The polar neutral amino acids include glycine, 
serine, threonine, cysteine, tyrosine, asparagine, and 
glutamine. The positively charged (basic) amino acids 
include' arginine, lysine and histidine. The negatively 
charged (acidic) amino acids include aspartic acid and 

35 glutamic acid. The foregoing classes may be modified by 
those skilled in chemical arts to create finer 
classifications. For example, phenylalanine and tryptophan 
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could be placed in a separate aromatic nonpolar group. 
Further, homology rules could depend on amino acid sequence, 
such as by dividing contiguous doublets or triplets of amino 
acids into homology groups. 
5 The invention is not limited to the above -described 

exemplary method of selecting candidate pharmacophores. Any 
automatic method of selecting candidates that depends only on 
chemical structure of binder library members, preferably 
expressed in terms of building block composition and 

10 sequence, can be used- For example, in the case of the 
preferred CXgC library, candidates could be selected by a 
clustering analysis performed on the entire amino acid string 
in a multi -dimensional space. 

This above method of selecting candidate pharmacophores 

15 is not limited to the preferred CX^C diversity library. For 
example, this method is immediately applicable to any 
diversity library having members comprising building blocks 
linked by a linear backbone by simply specifying rules of 
homology appropriate for the building blocks. These homology 

20 rules would group building blocks presenting similar 
structure and reactivity to targets* This method then 
selects candidates comprising sequences of homologous 
building blocks present on all the binding library members. 
If the library members do not have a linear backbone, a 

25 related candidate selection method can be used. In this 

case, the search for homologous building blocks would need to 
be confined to adjacent building blocks. Adjacent building 
blocks in this case are those building blocks brought 
physically close by whatever chemical structures form the 

30 library members (instead of simply being linearly adjacent on 
a backbone) . An adjacency determination would be specific to 
the particular chemical structure and would be algorithmicly 
specified- In addition appropriate rules of homology would 
be specified. The method would then select candidates 

35 comprising groups of adjacent, homologous building blocks, a 
group being present on each binding library member. 
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The above -described step is the selection step of the 
overall select and test method. Distance measurements and 
Monte Carlo structuring, steps 4 and 5, determine a consensus 
pharmacophore structure for the candidate, if possible. If a 
5 consensus is found, the candidate is the actual 

pharmacophore. If a consensus is not found, this selection 
step must be revisited, and a new candidate selected for 
test . 

10 5.6. INTRAMOLECULAR DISTANCE MEASUKEMENTS 

Having obtained N binders, their chemical building block 
structures (chemical formula or primary sequence) , and the 
identification of a candidate pharmacophore in each binder, 
steps 4 and 5 of the method of this invention cooperatively 

15 determine a precise spatial structure for the candidate 
pharmacophore (if it exists; if not, a new candidate 
pharmacophore is selected.) In the preferred (but not 
limiting) embodiment of this invention, N members of the CXgC 
library that specifically bind to the protein target of 

20 interest have been screened; their sequences determined; and 
a candidate pharmacophore consisting of homologous triplets 
(more generally from 2 to 6 mers) of amino acids has been 
determined in each binder , 

Step 4 measures one or more strategic distances, 

25 preferably no more than 10-20, e.g., 1-10 or, more 

preferably, 1-5 interatomic distances are measured. The 
remainder of the structure is determined in subsequent steps, 
other than by direct measurement. The interatomic distances 
measured in step 4 are preferably with an accuracy of at 

3 0 least 2 A, more preferably at least 1 A or 0.5 A or 0.25 A, 
and most preferably at least 0.05 A. Thus, in a preferred 
but not limiting embodiment, distances in the pharmacophore 
are specified to at least approximately 0.25 A. Step 5, 
using the CCMBC computational method, then completes 

35 determination of the pharmacophore structure at a high 
resolution and the structures of the rest of the binder 
molecules with a secondary resolution. Having a high 
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resolution structure for the pharmacophore of interest is 
orders of magnitude more useful than having a low resolution 
structure for an entire binder. Consequently, steps 4 and 5 
focus resources on the former problem. 
5 A distance measurement method is preferred for use if it 

meets certain conditions, as follows. First, accuracy of 
distance measurements is preferably better than at least 0.25 
A for distances on the order of those between amino acids in 
a peptide. Second, measurement conditions preferably 

10 approximate target binding conditions, i.e., are 

approximately physiologic. For example, crystallization, 
which may induce conformational changes, is preferably 
avoided. Also, the employed measurement methods preferably 
allow one binder sample to be measured when dry, when 

15 hydrated and when bound to the target molecule of interest, 
thereby observing the effects of water and conformational 
changes on binding. Third, the measurement method is 
preferably quick and inexpensive. 

Important advantages are conveyed by these certain 

2 0 conditions. First, as the method of the invention determines 
high resolution pharmacophore structures, use of distances 
less accurate than the intended results would almost 
certainly result in decreased resolution. Second, as the 
CCMBC structure determination method approximates the 

2 5 structural effects of hydration and target binding, use of 
accurate distances including the physical effects of 
hydration or binding helps increase the resolution of the 
computational results. These distances as used in the CCMBC 
method pull the binder structures towards a more accurate 

30 representation both of the bound, hydrated pharmacophore and 
also of the remainder of the binder molecule without a 
computationally burdensome inclusion of water molecules and 
without knowledge of the target molecule's structure. 
REDOR NMR is the preferred method of distance 

35 determination. REDOR is a solid phase NMR technique which 
directly measures the inter-nuclear dipole-dipole interaction 
strength b tween two spin M nuclear speci s, denoted D^^ wh re 
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A and B are the two nuclear species measured. The inter- 
nuclear distance between A and B is simply determined from D;^ 
by the following equation: 

Dab = -^^^ (1) 



where R,^ is the inter-nuclear distance, h is Planck's 
constant, and 7^^, and 7a are the respective gyromagnetic 
20 ratios of nuclei A and B. REDOR is typically accurate to 
less than 0.05 A and can generally measure distances up to 
about 8 A. 

Any two nuclear species observable and resolvable by NMR 
methods and, preferably, adaptable to chemical inclusion in 

3^5 the diversity library members of interest, may be the basis 
of REDOR measurements. Although the subsequent description 
is often directed to distance determinations between ^^C and 

nuclei in members of a preferred library comprising the 
sequence CXjC, this invention is not so limited. One skilled 

20 iri the art can readily adapt the method for use in making 
measurements of other types of molecules (e.g., peptides and 
nonpeptides) ; additionally, other nuclear species may be 
used. Other common spin M species that can be used include 
but are not limited to ^^P and the halogen ^'F. 

25 General references on NMR techniques are Slichter, 

Principles of Magnetic Resonance . Berlin, Springer- Verlag, 
(1989) and Mehring, Hiqh I^ggplytjon W in SQlX^^r Berlin, 
Springer-Verlag (1983) . REDOR references include Gullion et 
al.. Rotational-echo double -resonance NMR. J. Magn. Res. 

30 81:196-200 (1989); Pan et al . , Determination of C-N 

internuclear distance bv rotational -echo double -resonance NMR 
of solids . J. Magn. Res, 90:330-40 (1990); Garbow et al., 
BgtermiTiation of the molecular conformation of melanostatin 
using 13C. 15N-RED0R NMR spectroscoDv . J* Am. Chem. Soc, 

35 115:238-44 (1993), all of which are incorporated herein by 
reference. 
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Other solid phase NMR techniques are applicable but less 
preferred. These include but are not limited to those 
disclosed in Kolbert et al . , Measurement of internuclear 
distances bv switched angle spinning . J. Physical Chemistry 
5 98:7936 et seq . (1994), and in Raleigh et al . , Rotational 
Resonance NMR , Chemical Physics Letters 146:71 (1988). These 
techniques measure homonuclear distances only to 0»5 A 
accuracy and are less accurate than REDOR. Liquid phase NMR 
techniques of NOE (nuclear overhausser) and COESY 

10 (correlation enhanced spectroscopy) can also be used but are 
less preferred. They require complex interpretation to 
obtain comparable distance accuracy greater than 0.5 A in 
small molecules with complete rotational freedom. 

X-ray crystallography can also be used, although it is 

15 much less preferred, since crystallization may induce 

conformational changes in the binder, and since binding to 
the target molecule may be necessary for crystallization. 

In the case of REDOR measurements of the heteronuclear 
distances between "C and ^^N, "C and "N are introduced 

20 ("labeled") at the .positions between which a distance 
measurement is needed. The preferred embodiment of the 
invention measures the ^^N NMR resonance. Since nearly all 
the ^*N signal will originate with nuclear labels, very little 
background signal due to natural abundance nuclei need be 

25 accounted for. Alternatively, the "C resonance may be 

measured, in which case the natural abundance background is 
subtracted from the measurements. 

Since REDOR depends on observing the internuclear 
dipole-dipole interaction, the binder being measured should 

30 be substantially stationary on the time scale of the NMR 
signal. The measurement system preferably ensures this 
condition. The substrate holding the binder to be measured 
can be 'chosen so as to restrain binder motion, or the 
measured sample may be cooled to restrain motion, or, 

35 alternatively, the binder may be bound to its target molecule 
in order to restrain its motion. 
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Further details of the REDOR distance measurements will 
make reference to Fig. 3. This illustrates the measurement 
method for one labeling of one binder, which is repeated if 
the binder requires multiple label ings and also is repeated 
5 for each binder. Subsequent description will focus on only- 
one binder. 

Step 41 chooses a binder labeling. Labeling is 
preferably done to obtain the most information about the 
pharmacophore consistent with chemical labeling opportunities 

10 and available labeled amino acids. Backbone labeling, for 
example, labels the amide N of one amino acid and one of the 
backbone C's of a next adjacent or more distant amino acid. 
Backbone labeling is typically done in the backbone in the 
vicinity of the candidate pharmacophore. It might also be 

15 done away from a candidate pharmacophore to confirm a 
previously determined structure as described for step 6, 
Side chain labeling strategies vary with the chemical 
opportunities offered by the candidate pharmacophore. If a 
terminal N is available, an adjacent side chain or backbone C 

20 can be labeled. If not, the side chain C and backbone amino 
N can be labeled. Side chain labeling is preferably on side 
chains in the candidate pharmacophore. Preferred labeling in 
the candidate pharmacophore is either a backbone amino N and 
a nearby backbone C or a side chain C or, if available, a 

25 side chain amino N and an adjacent or nearby side chain C. 

In an alternative embodiment, to get the most structural 
information on the binders, these labelings are designed to 
select the actual major conformation from known possible 
conformations. For example^ if it is known from preliminary 

30 determinations that a binder may exist in one of a few, e.g. 
two, major backbone or side chain folding patterns, the 
labelings are chosen to distinguish these conformations. 
Nuclear pairs labeled for measurement are preferably those 
that have significantly different distances in the possible 

35 conformations. 

Multiple labeling of one binder to determine multiple 
distances at once is possible, for xample^ by including one 
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dipole-dipole interaction being observed, which varies with 
the nuclear species being observed and the measurement 
distance. For ^^C-^^N observations to 2 . 5 A the binder motion 
frequency should be less than approximately 200 Hz; for 
5 observations to 5 A, less than approximately 30-5P Hz; and 
for observations beyond 5 A, less than approximately down to 
10 Hz. The more polar the substrate, such as glass beads or 
p-MethylBenzhydrilamine ["mBHA") resin, the more are polar 
attached binders (such as are many peptides) restrained. 

10 Less polar substrates, such as polystyrene resin, provide 
less restraints for polar binders. In an embodiment wherein 
a peptide comprising the sequence CXgC is bound to an mBHA 
resin with an glycine residue serving as a linker to a 
binding site on the resin, probably no additional steps need 

15 be taken for 2^5 A measurements. Additional steps that can 
be used, if needed, to slow binder motions include cooling 
the measurement sample to, for example, liquid temperatures 
(approximately 77 "K) or binding to a large, relatively 
immobile target molecule. 

20 Second, the net binder density is important and 

typically is adjusted. The substrate preferably has an 
adjustable number of binder synthesis sites or binding sites 
per unit of substrate surface area. Too high a binder 
density on the substrate surface will cause inter-molecular 

25 nuclear dipole-dipole interactions to distort the REDOR 
distance measurements. To obtain accurate intra-molecular 
distainces, the peptides should be kept sufficiently far apart 
so that only intra-molecular nuclear dipole-dipole 
interactions are significant. Inter-molecular nuclear 

30 dipole-dipole interactions are preferably kept less than 

about 10% of the intra-molecular interaction. In the case of 
'^C-"N measurements, this criterium can be monitored by 
observing "C-"C dipolar couplings. As the dipole interaction 
falls off as R*\ keeping adjacent binders apart by more than 

35 approximately 2-3 times the distance to be measured is 

sufficient. For measurements to 5 A, this criterion can be 
satisfied by keeping binders approximately 10 A or more 
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apart. At a 10 A spacing interfering or ^^N signals will 
not exceed 2.8 hz, which is sufficient attenuation for 3 0 hz 
or greater measurements. 

In an embodiment wherein the binder is a peptide 
5 comprising the sequence CX^C, that is synthesized on an mBHA 
resin that is also to serve as the NMR substrate, there is an 
additional upper bound on the peptide density. To prevent 
disulfide dimer formation in more than approximately 5% of 
peptides, the peptides are preferably kept apart by at least 

10 their average size. Dimer formation and incorrect disulfide 
scaffolds result in unconstrained, flexible peptides of 
altered structure distorting the REDOR distance determination 
of the properly conf ormationally constrained, cyclized binder 
peptides. A 10 A or more separation will meet this 

15 requirement. In this case, more than 9S% of the disulfide 
bonds will result in intended intra-molecular constraints. 
This separation may be adjusted based on a determination of 
actual dimer formation by chromatographic (e.g., HPLC) or 
mass spectroscopic analysis of the peptide after cleavage 

20 from the substrate (see Section 6,6, infra). 

NMR instrumental sensitivity places a lower bound on 
binder density. By way of example, for an adequate observed 
signal to noise ratio using a preferred NMR spectrometer, no 
less than approximately 10^" observed nuclear spins should be 

25 present in a 0.1 g sample. This translates to having a 

binder density of no less than approximately 0.017 mmole/g (1 
mmole = 10"^ mole) . For alternative NMR spectrometers with 
higher field magnets (*H Larmor frequency of 500 mHz) , the 
binder density may be as low as 0.0017 mmole/g. 

30 A third substrate condition to be considered is pore 

size, which is relevant when measurement of binder bovmd to a 
target molecule is desired. In a preferred method of 
conducting such bound measurements, the substrate must have 
sufficient pore size so that the target molecules can diffuse 

35 to all binders on the surface of the substrate and bind to 
th m. For example, folded, moderate eiz d protein targets of 
50 kd are typically roughly sph rical with diameters of 
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approximately 50 A. Preferable substrate pore sizes for use 
with such moderate sized protein targets are no less than 
100-200 A. Excessive pore sizes can result in a too dilute 
binder that decreases NMR signal intensity. The preferable 
5 pore sizes also facilitate high purity peptide synthesis 
directly onto substrate resins by similarly facilitating 
diffusion of reagents and solvents to synthesis sites. Also, 
binder substrate binding is preferably of such a nature that 
it will not be disrupted under either dry conditions, aqueous 

10 conditions, and conditions suitable to binder- target binding. 
Generally, adequate pore sizes are in the range of 100-500 A, 
although this will vary with the size of the target molecule. 

Solid phase substrates that can be used include but are 
not limited to mBHA resins, divinylbenzyl polystyrene resins, 

15 and glass beads. All of these substances can be manufactured 
to have binding sites in the range from 0 to 1.0 mmol/g. In 
addition, these substrates can be made so as to have the 
following surface areas: for mBHA about 100 m^/g, for 
polystyrene from 50-100 m^/g, and for glass from 0.1-100 m^/g. 

20 These substrates also can be manufactured so as to have a 
surface binding site density in the range of from 0 to 1.0 
mmol/m^. More generally any microporous material with a 
surface density of binding sites adjustable from 0 to at 
least 1.0 mmol/m^, and preferably with pore sizes in the 

25 preferred ranges, can be used. Suppliers of such adjustable 
resins include Chiron Mimotope Peptide Systems (San Diego, 
CA) and Nova Biochem (San Diego, CA) . 

Peptide binders can be synthesized directly on the 
surface of the siibstrates, by way of example as set forth in 

30 Section 6.6 infra, to achieve a purity of preferably at least 
90%, more preferably at least 95%. In the case of a peptide 
comprising the sequence CX^C, the preferred peptide spacing on 
the substrate is no closer than approximately 10 A, or a 
peptide density of no greater than one peptide every 100 A^. 

35 Peptide synthesis on the preferred resin 

p-MethylBenzhydrilamine I "mBHA"] with 0.16 mmole/g of peptide 
binding sites, a surface of 100 m^/g, and a preferable pore 
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In a more preferred aspect, 95% purity is present. Methods 
of producing such solid phase substrates, as described above, 
are also provided. 

Step 4 6 REDOR spectroscopy is performed on the 
5 strategically labeled, binder peptide-resin sample. Step 46 
details include final sample preparation, spectrometer 
parameters and tuning, and excitation pulse sequence. Sample 
preparation can be carried out by standard methods. The 
binder peptide-substrate sample is dried in Nj, and an 
10 approximately 0.1 g atnount is sealed in the NMR measurement 
rotor. The rotor can be cooled, if necessary, to limit 
binder motion. 

At: alternative final sample preparation step is to bind 
the target molecule to the binder peptide-resin sample and 

15 then dry the complex in • Optionally, the binder peptide 
can be split from the resin before binding to the target. In 
this alternative, the highly accurate REDOR NMR distances are 
of the bound binder and thus reflect any conformational 
changes that occur upon binding with the target. 

20 A triple resonance, magic angle spinning ["MAS"] NMR 

machine is adaptable to REDOR measurements. Such machines 
are commercially available from Bruker (Billerica, MA) , 
Chemmagnetics (Fort Collins, CO), and Varian (Palo Alto, CA) . 
An exemplary machine suitable for use is in the laboratory of 

25 Prof. Zax, Cornell University (Ithaca, NY). This machine 

includes a 7.05 Telsa magnet from Oxford Instruments (Oxford, 
United Kingdom) and RF pulse excitation and receiving 
hardware conventional in the NMR art. An exemplary 
measurement rotor is a triple resonance, MAS probe from 

30 Chemmagnetics. 

The exemplary magnetic field is adjusted for a Larmor 
frequency of 300 Mhz with, corresponding Larmor frequencies 
for "C and "N of 75.4 and 30.4 Mhz, respectively. An 
exemplary probe spin frequency (w^) is 4.8 kHz, with 

35 corresponding rotor period (T^) of 0,208 msec. "N resonances 
are measured* Th low natural abundance of ^^N eliminat s the 
n ed for natural background corrections. Alternatively, ^*C 
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measurements can be done with conventional background 
corrections . 

REDOR is a pulse NMR technique requiring careful 
excitation of appropriate ^H, "C, and resonances 
5 synchronous with the MAS rotor and followed by observation of 
the free induction decay. Many alternative REDOR 
excitation sequences have been described in the literature, 
some of which are found in the references cited hereinabove. 
These sequences can involve multiple excitations per rotor 

10 period. The simple pulse sequence preferred for use in this 
invention requires only one "C excitation per period. 

The exemplary sequence for 8 rotor periods is 
illustrated in Fig. 4, and is detailed herein in a manner 
such that those skilled in the NMR arts can program an NMR 

15 spectrometer for similar measurement. Three channels excited 
are the channel 50, the ^^C channel 51, and channel 52. 
The ^^C and RF power supplies are tuned to the resonances 
of the nuclei whose distance is to be measured. The 
channel RF power is initially tuned to the resonance of a 

20 proton coupled to the ^^N of interest. The time sequence, 

(increasing to the right) of the exciting signals (increasing 
vertically) in each of these channels is illustrated. 

In the ^^N channel, an initial excitation is applied to 
the ^^N spins in either of two manners: either an initial 7r/2 

25 pulse may be applied or, as illustrated and preferred, a 
cross polarization transfer from the protons is made. 
Sufficient RF intensity is applied at time 54 in both the 
and channels, 50 and 51 respectively, to achieve a 
Hartman-Hahn precession match at a tt spin flip time of 13.2 

30 lisec. Subsequent to the initial ^*N excitation, synchronous tt 
pulses 56 are applied in phase with the MAS probe rotor for 
rotor cycles, denoted by line 59,. with sufficient RF 
intensity to achieve a tt spin flip time of 13.2 ^sec. The 
phase of these tt pulses is varied systematically to reduce 

35 artifacts in a manner well known in the NMR arts. The 
preferred sequencing is detailed in Table 1. 
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Table 1 



5 



TT Pulse Phs 




Sequencing 


Number of rotor cycles 
between excitation and 
observation 


Phase sequence 
(in processing frame) 


2 


YY 


4 


XYXY 


8 


XYXYYXYX 



10 

The phase sequence is expressed as the axis, in the frame 
processing with the ^^N spins, about which the tt spin flip is 
made. This axis is systematically varied depending on the 
number of rotor periods intervening between the ^^N excitation 

]L5 and signal observation. The illustrated phase sequences may 
be varied into equivalent sequences in a conventional manner. 
For example, "XYXY" is equivalent to "-YX-YX". Finally, at 
501 the free induction decay of the ^^N spins is observed and 
generates the time domain output signal , 

20 Iri the channel, the preferred sequence is an initial 

exciting 7r/2 pulse 53 followed with the previously described 
cross polarization transfer 54 to the spins. The less 
preferred sequence omits these initial pulses in favor of a 
tt/2 excitation. During the subsequent spin evolution time 

25 for rotor cycles and the free induction decay time 501, a 
decoupling field 55 is applied to the protons. The preferred 
decoupling field has a €6 kHz RF intensity to achieve a tt 
spin flip in 7.6 ^sec. 

In the ^^C channel, two distinct options must be 

30 measured. The first option (not illustrated) has no "C 
exciting pulses. The second option (illustrated) has 
synchronous ir pulses 57 applied for rotor cycles at the 
rotor frequency but with a fixed phase delay 58, denoted by 
ti, and at sufficient signal intensity sufficient to achieve a 

35 TT spin flip time of 10.6 ^sec. hny value of ti may be used; 
the preferred value is 1/2 the rotor period, Tr/2 . 
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Alternative REDOR pulse sequences include 2 or more ^"^C pulses 
per rotor cycle. 

Summarizing still with reference to Fig. 4, a REDOR 
measurement scan is characterized by the number of rotor 
5 cycles, N^, of spin evolution. A complete scan comprises, 
first, an equilibration period, preceding the illustrated 
pulse sequences. Second, there is a excitation period 
comprising pulses 53 and 54. Third, there is a spin 
evolution period for N^. rotor cycles which has two options, 

10 both measured. Both options comprise the application of 
decoupling field 55 and synchronous in phase ^^N tt pulses 
56. The first option has no ^^C excitation; the second has 
synchronous phase displaced ^'C -n pulses 57. Fourth, and 
finally, there is observation of free induction decay 501 of 

15 the spins. Fig. 4 illustrates an of 8. Each scan 
option is repeated, and the induction decay signal 
accumulated, for a sufficient number of times to obtain 
acceptable signal to noise ratio. With the preferred 
practice, this has required less than approximately 5 , 000 

20 scans, and typically 3000 have been sufficient. 

An alternative implementation of the REDOR measurement 
interchanges the roles of ^^C and ^^N and measures the free 
induction decay of ^^C. Further, the invention is not limited 
to this described pulse sequence and is adaptable to 

25 equivalent pulse sequences yielding direct inter-nuclear 
dipole-dipole interaction strengths. 

Following REDOR measurement step 46, is data analysis 
step 47. This comprises several substeps. As is 
conventional, the free induction decay signal is Fourier 

30 transformed from the time domain to the . frequency domain. 
The scan option without the excitation produces a 
transformed signal with an observed "N resonance peak of 
magnitude S; the scan option with ^^C excitation produces an 
observed resonance peak of magnitude S^. The REDOR output 

35 signal, denoted LS/S, is conventionally formed according to 
the equation: 



- 61 - 



wo 9^849 



PCTAJS96/04229 



AS ^ ' (2) 

s s 



The output signal is observed for different N^. Preferably 0, 
^2, 4, and B rotor cycles are observed. Other preferred 
will be apparent during the following description. 

Further analysis of the REDOR output signal, AS/S, is 
made clearer by a very brief explanation of how this output 
signal represents the spin 1/2 dipole-dipole interaction 
between the and "N. In the spin evolution period, the 
decoupling excitation eliminates all proton effects from the 
^^C and NMR spectra. Magic angle spinning, in the scan 
option without any ^^C excitation, eliminates all nuclear 
dipole-dipole and chemical shift anisotropy from the NMR 
line. Thus signal S represents an NMR resonance without any 
dipole interaction. However, in the second scan option, the 
^^C 71 spin flip pulses reintroduce in a controlled manner the 
dipole-dipole interaction. This interaction causes 
additional dephasing, or loss of signal strength, in the 
observed "N signal/ Thus signal Sf represents an NMR 
resonance with dipole interaction and the output signal AS/S 
represents the percentage strength of pure dipole-dipole 
interaction between the and nuclei. The exact loss of 
signal strength depends on the timing of the pulses and 
the number of rotor cycles for which they are applied. 

In the alternative where a general phase delay, t^, is 
used, the expression for the REDOR signal is derived by 
numerically integrating the following equations from the Pan 
et al. reference (1990, J. Magnetic Resonance 90:330-340): 

3D 



2 2% 



0 0 



(3) 



35 



where 
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a)p(c,p,t) = ±-il?^^(sin^ (P) cos2 (a + o^t) - v/2sin2Pcos (a+Q^t] 

(4) 

0 C- 



This integration can be done by standard numerical 
integration techniques such as are found in Presr et al . , 

10 Numerical recipes: the art of scientific computinq . 

Cambridge, U,K., Cambridge University Press, (1986), chapter 
4, which is herein incorporated by reference. Alternatively 
the expression can be directly evaluated from the symbolic 
representations by numerical tools such as Mathematica from 

15 Wolfram Research Inc. (Champaign, ID or Mathcad from 

Mathsoft Inc. (Cambridge, MA). In a preferred embodiment, 
however, a much simpler approach is used. 

In the preferred embodiments the ^^C pulse phase delay is 
1/2 the rotor period, T^, and the preceding equations can be 

20 simply expressed (Mueller et al . , 1995, J. Magnetic 
Resonance, in press) : 



A5 

S 

25 ^ " ^cTr^Cf^ 



^ jfe ISic^-l (5) 



where is a Bessel function of the first kind. Adequate 
accuracy is obtained by limiting the summation of equation 5 
to its first five terms. Fig. 5 is a graph of this equation. 
Vertical axis €1 represents LS/S; horizontal axis 62 
represents X; and graph 63 represents equation 5. 

In detail, step 47 of Fig. 3 uses equation 5 and the 
REDOR output signal, AS/S, for various values of to obtain 
a best value for Doj, the dipole interaction strength. The 
intemuclear distance is simply and directly determined from 
by equation 1. An exemplary method for finding the best 
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value of is to use a least squares method. First, form 
the sum of the squares of the differences of the observed 
LS/S and LS/S computed from equation 5, which will be a 
function of Dq,, T^, and N, through X. Second, find the value 
5 Do, minimizing this function by searching exhaustively in 
sufficiently small increments over the relevant range. For 
example, Dq, can be varied by varying R in 0.01 A increments 
from 0.5 to B A. More efficient minimization methods as 
presented in Press et al . chapter 10 can also be used. 
10 Values of the Bessel functions can be simply calculated by 
the methods in Press et al, supra, § 6.4. Alternatively, 
this minimization and best value determination is easily 
performed directly from the symbolic representations with the 
previously cited mathematical packages. 
15 The example in Section 6.6 provides typical results of 

this measurement and analysis method. 

This completes the method of Fig. 3 and determines the 
internuclear distance between the and "N nuclei to which 
the excitation channels were tuned for the REDOR NMR 
20 measurements. If other C-N pair distances are to be 

determined in the labeled binder, step 4 6 as detailed above 
is repeated for the other distinct resonances. If the 
alternative ^^N resonances cannot be distinguished, separately 
labeled binders are prepared and measured. 

25 

5.7 • CONSENSUS, CONFIGURATIONAL BIAS MONTE CARLO 
Broad overview 

With reference to Fig. 1, having found N specifically 
binding members of one or more libraries, step 2, selected a 

30 candidate pharmacophore shared by all these binders, step 3, 
and determined a few strategic distances in the vicinity of 
the candidate pharmacophore, step 4, precise pharmacophore 
and binder peptide structures are now determined by the 
preferred method, the consensus, conf igurational bias Monte 

35 Carlo method. Other order ings and identities of these steps 
are possible. For example, the binders may be predetermined 
thereby rendering step 2 unnecessary. Further, no strategic 
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distance measurements may need to be made, and step 4 may be 
omitted. Alternatively, a partial structure determination 
step may be inserted before step 4 to guide selection of 
distances for measurement. 
5 Pharmacophore structure determination of this invention 

is not limited to the CCBMC method to be described. CCMBC 
makes the most efficient use of heuristic consensus binding 
and partial distance measurement information. However, the 
consensus pharmacophore can be determined by methods 

10 including but not limited to use of exhaustive REDOR NMR 

measurements or by extensive but fewer REDOR measurements in 
conjunction with a conventional molecular structure 
determination method, such as molecular dynamics, 
conventional Monte Carlo, or even peptide folding rules. 

15 In the following description, the CCBMC method is 

broadly overviewed; subsequently, details of important steps 
are described; and finally a description of the preferred 
computer method and apparatus for practicing the invention is 
given. From the description of the methods, equations, data 

20 structures, and programs provided herein, one will be able 
readily to translate them into implementations. 

Although the following descriptions are directed to 
binders isolated from the preferred library of peptides 
comprising the sequence CXgC (constrained by disulfide bonds) , 

25 the method is applicable to more general organic diversity 
library members. It is immediately applicable to compounds 
from constrained peptide libraries with other scaffolds and 
also to compounds from similar peptoid libraries. It will be 
readily apparent that the method is applicable to any 

30 compounds whose structural region of interest exhibits 
conformational degrees of freedom at a temperature of 
interest (e.g., body temperature 21^C) that are limited to 
torsional rotations of rigid molecular subunits about bonds 
between the subunits, in which any loops present in the 

35 structural region of interest are independently rotatable by 
concerted rotation (se Section 7, Appendix: Concerted 
Rotation) . Examples of such compounds include but are not 
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limited to peptides, peptoids, peptide derivatives, peptide 
analogs, etc., including members of libraries discussed in 
Section 5.2, supra. 

General features of Monte Carlo simulation methods are 
5 known. A reference is Rowley, Statistical mechanics for 

thermophvsical property calculations > Englewood Cliffs, N.J., 
PTR Prentice Hall (1994), especially chapters 5 and 7, which 
is herein incorporated by reference. The application of 
simple Monte Carlo to constrained peptides has conventionally 

10 been hindered by difficulty generating geometrically proper 
and energetically useful conformational alterations, and by 
the consequent wasteful and inefficient exploration of 
conformational space. This method overcomes these problems 
for constrained peptides with a novel combination of 

15 techniques. In addition, this method is uniquely able to 
incorporate partial information about binding affinities and 
distance measurements to improve determination of the 
pharmacophore structure, one goal of the invention . 

Fig. 6 is a overview of the method. Step 91 represents 

20 the initial geometric and chemical structure of each binding 
peptide in computer memory. Peptide geometric structure is 
represented as a set of records, each record representing one 
rigid subunit or one atom of the peptide. The subunit 
records are linked together as the subunits are linked in the 

25 peptide molecule. Each rigid unit record includes fields for 
the composition, structure, and connectivity of the rigid 
unit represented. Since the rigid units only undergo 
torsional rotations about mutual bonds, their internal 
geometric structure is fixed. 

30 If a previous run with these peptides has been done, 

peptide initial structure may be chosen as one of the 
structures generated late in that run. Such an initial 
structure is desirable since the effects of arbitrary initial 
conditions have been eliminated. Alternatively, an initial 

35 structure is generated from a prototypical backbone without 
side chains by adding sidechains with random torsional 
orientations. For members of each type of diversity library, 
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a prototypical backbone meeting structural constraints and 
representing an allowed configuration for a member possessing 
no side chains can be defined. The prototypical backbone for 
the CX^C library is generated from the CCBMC model itself as 
5 run for the linear peptide C{gly)gC (SEQ ID NO: 7) using a 
Hamiltonian consisting only on the H,^ term. The H^mr term 
contains only terms which, in the disulfide bond backbone 
region -Ci-Sj-Sj-C^- , limit the Sj-Sj distance to 2.038 A and 
both the Cj-Sa and the S^-C^ distances to 2.883 A. When run 

10 for a linear peptide, no Type II backbone moves are made. 
Only Type I backbone moves which remove and regrow randomly 
selected portions of the backbone are used to generate 
backbone alterations. The model is run with temperatures 
gradually decreasing from room temperature to a small 

15 temperature, approximately 1 ^'K. The final low temperature 
structure is used for the prototyptical backbone. Backbones 
for similar constrained peptide libraries can be constructed 
in similar manners. 

In memory, for each peptide, a current structure is 

20 represented; the initial current structures being the just 
assigned initial structures. Also in memory is represented a 
proposed modified structure for one peptide. At step 92 the 
processor generates "moves" that transform the current 
structure of a randomly chosen peptide into a proposed 

25 modified structure. The moves mimic body temperature (37 ^C) 
thermal agitation experienced by the binders so that their 
equilibrium structure may be determined. 

Generation of these moves for conformational ly 
constrained peptides is an important aspect of this method. 

30 There are two move types. Type I moves alter the 

conformation of the side chain of a randomly chosen amino 
acid of the randomly chosen peptide. The alteration is built 
by side chain removal followed by side chain regrowth into a 
new torsional conformation. During regrowth, unfavorable 

35 overlap with neighboring side chains is -avoided. Type II 
moves alter the conformation of a limited random region of 
the peptide backbone of a randomly chosen binder by 
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terms, and heuristic constraint terms. Conventional terms 
include the energies of rigid unit torsional rotations and of 
Lenard- Jones, electrostatic interactions, and H-bonding 
between atoms in different rigid units. Bond lengths and 
5 angles are assumed fixed at the temperature of interest and 
their energies constant. These conventional interactions are 
exclusively intramolecular; no physical intermolecular 
interaction effects are considered in this invention. 
References for the conventional energies are Weiner et al., 

10 An all atom force field for simulations of proteins and 
nucleic acids . J. of Computational Chem., 7:230-52 (1986); 
and Weiner et al . , A new force field for molecular simulation 
of nucleic acids and proteins . J. Amer. Chem. Soc. 106:765 
(1984) (herein referred to as the "AMBER references*'), which 

15 are herein incorporated by reference. 

Another important aspect of the Monte Carlo method of 
this invention is the heuristic terms; the consensus term and 
the measurement constraint term. They uniquely make use of 
partial information on the binder peptides to guide the Monte 

20 Carlo simulation. The consensus term, Hcon»en#us' is added to 
the Hamiltonian to represent that all the binders do in fact 
bind to the same protein target in the same physical and 
chemical manner. Since binding occurs at the shared 
candidate pharmacophore in each binder, this term makes 

25 energetically unfavorable moves that cause the geometric 
structure in the shared pharmacophore to depart from an 
average, common structure. Peeudo chemical "bonds" to this 
average structure are added which mimic the actual physical 
bonding to the surface groups of the protein target. If the 

30 candidate pharmacophore is in fact the actual pharmacophore, 
this energy will become minimized and small in the 
equilibrium configuration, since there will be an actual, 
shared, geometric configuration. If the candidate 
pharmacophore is not the actual one, this term will not 

35 become minimized or small, as there is no physical reason for 
this region of the peptide molecules to share a common 
structure. This is the only Hamiltonian term which coupl s 
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the N binders together; no physical intermolecular effects 
are considered. The binders are otherwise treated 
independently by the method. 

The measurement constraint term, H,^, is added to 
5 represent the distance measurements made, which are in fact 
actual distances in the molecules and constrain any simulated 
structure. This term makes energetically unfavorable, by 
adding pseudo chemical bonds of the measured lengths, moves 
that cause the constrained internuclear distance to depart 

10 from their measured values. Of course if no partial distance 
measurements have been made or are otherwise available, this 
term may simply be omitted from the Hamiltonian without 
adversely affecting the practice of this step. Which 
measurements to make, if any, is guided by the results of the 

15 consensus structure determined. If an adequate structure can 
be obtained without assistance of distance measurements, none 
need be incorporated. If inadequate results are obtained, 
additional iterations of the method will need distance 
measurement inputs . 

20 Step 94 tests the proposed structure against an 

acceptance probability, accept (curr*>prop) . This acceptance 
probability is determined by the energy of the proposed 
structure previously computed in step 93. If the proposed 
structure fails this test and is not accepted, the method 

25 progresses immediately to step 96. If the proposed structure 
meets the test and is accepted, the accepted proposed 
structure replaces and becomes the current structure. The 
proposed structure of this peptide is also saved (given 
certain other conditions detailed later) in a separate memory 

30 store of structures for later analysis. This structure store 
is preferably on disk. 

Repeated application of the concerted rotation may lead 
to a slightly imperfect structure, due to numerical precision 
errors. In an alternative embodiment, peptide geometry would 

35 be restored to an ideal state by application of the Random 
Tweek algorithm after several thousand moves (Shenkin et al., 
1987, Biopolymers 26:2053-85). 
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Step 96 tests whether enough structures of equilibrated 
total energy have been generated in this simulation run. The 
run terminates if a sufficient number have been generated. 
Sufficiency is determined on the basis of whether the 
5 statistical sampling errors of the average pharmacophore 
structure determined at step 97 is adecjuate (typically, less 
than 0.25 A). Preferably, 25,000 equilibrated structures 
would be accumulated for each run. Also, preferably, three 
runs would be performed for a total of 75,000 saved 

10 structures. 

Fig. 9 illustrates energy equilibration of an actual 
run. Axis 101 is the total energy of a set of peptide 
binders; axis 102 is the number of moves accepted. Traces 103 
represent total energies of all binders from each of the 

15 three runs. Typically, run energy rapidly equilibrates 
within less than approximately 2000 moves in most cases. 
Subsequent saved structures are counted toward termination. 
Traces 103 display typical energy variations superimposed on 
a secular stability. The illustrated energy variations 

20 typically comprise several components having different 
variabilities. First, there is a very high frequency 
oscillation with a period of a few tens of moves (known as 
"hair") . Second, there is a low frequency oscillation with a 
period of several hundred to a few thousand moves and with 

25 low amplitude. 

Step 97 analyzes the structure stored in memory. In the 
simplest preferred embodiment, the stored geometric 
structures for each binder are simply averaged, yielding a 
final structure for each binder and for the candidate 

30 pharmacophore. In another alternative, clustering software 
seeks clusters of similar structures for each binder. The 
clusters are then averaged to give a final structure for each 
variant structure for each binder. The variants represent 
alternative foldings for the binder. Exemplary clustering 

35 methods are found in Gordon et al. Fuzzy cluster analysis of 
molecular dynami cs trajectories . Proteins: Structure, 
Function and Genetics 14:249-264 (1992). 
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Alternative post -processing can be done on the clustered 
structures to account for small bond angle vibrations. Such 
vibrations are expect d to make small perturbations to the 
clustered structures determined by the Monte Carlo method and 
5 can be accounted for by a brief molecular dynamics 
simulation. Such a simulation is fully defined by the 
Hamiltonian, comprising the physical and heuristic energies 
to be described infra in Eqn. 8, and by the temperature of 
interest. The structures observed during the simulation are 

10 averaged to determine a final more accurate equilibrium 

structure. A code capable of performing such a simulation is 
Discover® from BIOSYM (San Diego, CA) . Preferably, the 
molecular dynamics simulation would be run for approximately 
10* bond angle vibration periods. Since the typical bond 

15 angle vibration period is 10'^ ps (1 ps = 10'^^ sec), such a 
run will encompass approximately 1 ns of molecular time. 

Conf iaurational bias move generation details 

One Type I or II move will, in general, alter the 

20 position of several -rigid units on a side chain or along the 
backbone. Each altered rigid unit is sequentially considered 
during move generation. The Hamiltonian describing the 
energy of the rigid unit currently being considered in a move 
is divided into an internal, u^"S and an external, u"^, part, 

25 where u"^ is all energy not included in u^"*^. In the preferred 
embodiment, u*"^ is set to 0; an alternative choice would be 
to include only the torsional interaction energy between this 
rigid unit and units to which it is currently bound, u^ 
generates a probability distribution, p^**^, according to which 

30 is generated a set, <t>^, k » l.-,K, of candidate torsional 

angles for the bond between the rigid unit being examined and 
rigid units already examined, u**^ generates another 
probability distribution, p**^, according to which is selected 
one torsional angle from the prior set as the proposed new 

35 angle for the rigid unit being examined. These probabilities 
are defined by the equations: 
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exp(-Piir'(4.,.^)] 



(6) 



10 



15 



20 



25 



In this equation, signifies the rigid unit being 

considered, K is the total number of candidate torsional 
angles generated by p^"S and jS = 1/kT (k is Boltzman's 
constant; T the temperature, preferably 37 °C) . The overall 
probability of generating a transition from the current to 
the proposed structures and accepting the proposed structure 
are given by the equations: 



i-1 



accept { curx^prop) =min ( 1 , — ) 



In this equation, M is the total number. of rigid units added 
in the move. VT^** is a weight for the reverse move and will 
be described subsequently. 

Because energy is included in the generation 
probabilities, proposed structures are preferentially of 
lower energy. Since the acceptance of proposed structures 
depends on their energies, the acceptance of proposed 
structures is thereby more probable. 
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Peptide memorv representation details 

It is well known that at body temperature peptides 
consist of linked rigid units capable only of torsional 
rotational about mutual bonds whose lengths and angles are 
5 fixed. The torsional rotations respect any molecular 

conformational constraints. See Cantor et al . , Biophysical 
chemistry part I the conformation of biological 
macromol ecules . New York, W.H. Freeman and Co. (1980), which 
is herein incorporated by reference. Table 2 lists the rigid 
10 units encountered in the preferred embodiment of this 

invention utilizing libraries of conf ormationally constrained 
peptides. Table 2, where applicable, also lists dihedral 
bond angles between incoming and outgoing bonds to a rigid 
unit and the assigned unit type. 

15 

Table 2 



20 



25 



Type 


Chemical 
Structure 


Bond angle 
(if applicable) 


Backbone and side chain 
rigid units 


A 


-NH, 




B 


1 
1 

-CaH- 


70. 5» 


C 


-CONH- 


70.5° 


D 


-COOH 










Side chain only rigid units 


E 


-CHj- 


70.5» 


F 


1 
1 

-CH- 


70. 


G 


-S- 


70.5*' 


H 




0» 


I 


-CH3 




J 


-OH 




K 


-SH 
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5 



Type 


Chemical 
St.3ruct.uTe 


Bond angle 
\ix appxxcaoxe^ 


L 


-NH; 




M 

rl 






N 


-CONHj 




0 






P 


-C3NJH3 




Q 







Table 3 illustrates the decomposition of all amino acid side 
chains into rigid units. Glycine is a special case, without 
a side chain. Proline is a special case with a side chain 
cyclically bonded to the backbone amino N. 



20 



25 



30 



35 
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Table 3 



5 



10 



15 



25 



Amino Acid 


Kigid units 


Glycine 


-CaHj- (SPECIAL CASE) 


Alanine 


-CH, . .. 


Arginine 


-CHj-CHj-CHj-CNjH^ 


Aspartate 


-CHj-COOH 


Asparagine 


-CH^-CONHj 


Cysteine 


-CH,-SH 


Glutamate 


-CH,-CH,-COOH 


Hxstiuine 


-CHj-CgNjH, 


Isoleucine 


- CH ( - CHy ) - CH, - CH, 


Leucine 


-CHy- CH ( -CHj ) 5 


Lysine 


- CHj - CHj - CHj - CHj - NH2 


Methionine 


-CHj-CH^-S-CHa 


Phenylalanine 


-CHj- CgHj 


Serine 


-CHj-OH 


Threonine 


-CH{-CH3)-OH 


Tryptophan 


-CH^-CgNHs 


Valine 


-CH(-CH3) -CH, 


Tyrosine 


-CHj-C^H^-OH 



Fig. 10 illustrates a structurally correct but 
geometrically inaccurate decomposition of the peptide 
backbone CX^C into rigid units (inessential hydrogens have 
been omitted) . Rigid units are set off in boxes 121 and 
their types 122 are indicated. Fig 11 illustrates a 
structurally correct but geometrically inaccurate 
decomposition of the peptide backbone and side chains of 
-arginine-glycine-aspartate- ("RGD") into rigid units. Rigid 
jg \inits are set off in boxes 131 and their types 132 are 
indicat d. 
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Rigid units are represented as records in memory. The 
data structure for a peptide comprises records for its 
constituent rigid units linked together by data pointers 
exactly as the actual rigid units in the peptide are 
5 chemically linked. The record representing a rigid unit 
comprises fields for: type of the unit, pointers to 
chemically bonded units, all atoms of the unit and their 
spatial positions, atoms of the unit that are the target of 
the incoming and outgoing bonds, amino acid to which the unit 

10 belongs, and atomic composition of the unit. 

A knovm, conventional representation of atoms and atomic 
interactions is taught by the AMBER references. Each atom is 
divided into a series of subtypes of specific properties. 
For example, for carbon there are subtypes C, C2, CA, CT, 

15 etc.; for nitrogen, there are N, N2, etc.; for oxygen, there 
are O, 02, etc.; and for hydrogen, there are H, H2, etc. 
Bonds between each pair of subtypes are separately 
characterized by equilibrium lengths, angles, and torsional 
energies. Interactions between each pair of subtype atoms 

20 are separately characterized by Lenard-Jones force 
parameters, hydrogen bonding force parameters, and 
electrostatic charges. Amino acid charge distributions are 
in Weiner et al . , J. of Computational Chem. , 7:230-52 
(1986) . 

25 Thus each atom in each rigid unit is represented by an 

in-memory record comprising fields for: its AMBER reference 
subtype and any electrostatic charge. The atom's spatial 
position relative to its containing rigid xinit, stored in 
that unit's record, is geometrically determined from the 

30 unit's internal chemical stxructure and bonds by the AMBER 
bond lengths and angles defined for each of these bonds. The 
relative spatial positions of atoms within a rigid unit are, 
of course, fixed, and there is no interaction energy to 
consider between atoms within a rigid unit. 

35 Fig. 11 is a complete memory representation of a 

tripeptide sequence -RGD- {a known pharmacophore) . Rigid 
units are set off in boxes 131 suid their typ s 132 are 
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indicated. The torsional degrees of freedom between the 
rigid units are indicated by angle arrows 133. AMBER atoms 
types are indicated as at 134 . Net atomic charges are 
indicated only for arginine as at 135. Rigid unit records 
5 are linked into a data structure modeling the rigid unit's 
physical linkages. Not shown are relative atomic spatial 
positions represented by the atoms rectangular coordinates. 

All parameters defining the AMBER atomic representations 
and interatomic forces can be found in Weiner et al.^ J. of 

10 Computational Chem. , 7:230-52 (1986), and Weiner et al . , J, 
Amer. Chem. Soc, 106:765 (1984). Conventionally, these 
parameters are obtained from computer readable files from 
commercial sources. The preferred computer readable source 
of these parameters is from Insight II® 2.3.5 software from 

15 BIOSYM (San Diego, CA) . Other sources are Tripos (St, Louis, 
MO) and CHARMm (Molecular Simulations, Inc., Burlington, MA). 



Interaction energy evaluation details 

The form of the intramolecular energy, or Hamiltonian, 
20 evaluated at step 93, is an important element of this 
invention. The Hamiltonian consists of the components: 

Jtbinders ( g ) 



-^i, tocaJ '^i, molecuisr'^^l, KKR^-^I, consensus 



25 



The Hi a^oiecuiar component is determined from the Weiner et al . 
references, J. of Computational Chem., 7:230-52 (1986) , and 
J. Amer. Chem. Soc, 106:765 (1984). 



30 



35 



n; jc 
rigid unit 
tCTsionmi 
Mnglms 



atom pail 9 



f<7 
acorn pAizB 



tR 



H-bona p»ix» 




(9) 
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Here, . is the i'th torsional angle between rigid units of 
the I'th binder peptide, and Rj is is the interatomic distance 
between the i'th and j'th atoms in different rigid units of 
the I'th binder. The first term in this equation is the 
5 torsional energy of rigid units; the second is the 

interatomic Lenard-Jones energy; the third is the interatomic 
electrostatic energy; and the fourth is the interatomic 
hydrogen bond energy. Rigid unit torsional rotations 
directly change the first term. Such rotations indirectly 

10 change all other terms as interatomic distances change. 

The AMBER parameters Vi„, A^^, B,^, q^, C^^ and are 
obtained as stated above. The effect of water is 
approximated in a known manner by setting € equal to 4 for, 
where r is distance (in A) in the electrostatic term and is 

15 the vacuum permeability. 

The distance constraint term, as described, makes 
energetically unfavorable moves which cause those measured 
interatomic separations in the simulation to depart from 
their measured values. If no measured values are available, 

20 this term is simply omitted from the Hamiltonian. Since this 
is not a physical energy and in simulation equilibrium the 
binders should have the measured distance, it is advantageous 
that this term should make only a small contribution to the 
equilibrium energy, no more than 10% of the total energy and 

25 preferably approximately 2.5 to 5%. Further, it is 

advantageous that the energetic disfavor be weighted by the 
confidence in the measurements, so that measurements having 
more confidence have a greater effect. 



Many forms of this energy meet these criteria. The 
30 preferred form is: 




(10) 



35 
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where R'°'i.ij is a measured distance in the I'th binder peptide 
between atomic pair i j . This makes the constraints appear as 
an elastic pseudo-bond with equilibrium length as measured. 
The Wi are weights designed to meet the above size criteria. 
5 In the preferred embodiment, they are calculated with an 
overall multiplicative factor limiting the contriT^ution of 
Hi,HKR to no more than approximately 5% of the total 
equilibrated energy. Their relative value is selected to 
reflect the lower reliability of longer measurements. Thus 

10 if R*°'a,ij is between 0 and 3 A, w^ has a relative value of 1; 
if the measurement is between 3 and 4.5 A, the relative value 
is 2; if between 4.5 and 7 A, the value is 3; and if the 
distance exceeds 7 A, the term is dropped from the sum. 
Other alternative weight assignments meeting the general 

15 criteria are clearly possible. 

The consensus constraint term, as described, makes 
energetically unfavorable moves which cause the candidate 
pharmacophore in each of the binders to depart from an 
average, shared configuration. In simulation equilibrium 

20 when the candidate is the actual pharmacophore, the binders 
share the pharmacophore structure and this term should be 
small. Since this is not a physical energy, in the case 
where the candidate pharmacophore is correct, this term 
should not be large compared to the total energy, in 

25 equilibrium no more than 10% of the total energy, and 

preferably approximately 5%, Further, the energetic disfavor 
should preferably be weighted by the affinity of each binder 
for the protein target, so that binders with greater affinity 
have a greater energetic effect. 

30 Many forms of this energy meet these criteria. The 

preferred form is: 

H = T ^^i.irRj?)^ (11). 

pharmocophor e 
dimtMnce pmizs 



- 80 - 



wo 96/30849 



PCT/US96/(M229 



R*^\., the shared consensus structure for the candidate 
pharmacophore, is an average of the interatomic distances 
between corresponding atomic positions, i j , in the shared 
pharmacophore in all binders. This makes the constraints 
5 appear as a pseudo-bonds to a shared pharmacophore, which 
represents the binding to the protein target. The w'^^^ are 
weights designed to meet the above size criteria. In the 
preferred embodiment, they are calculated with an overall 
multiplicative factor limiting the contribution of H^^^^^^.^, to 

10 no more than approximately 5% of the total equilibrated 
energy. Their relative value is selected to reflect that 
binders with lower affinity are less reliable indicators of 
actual pharmacophore structure. Thus the relative value of 
the weights is proportional to the logarithm of the affinity 

15 of the corresponding binder with an affinity of 1 /zmolar 
having a relative weight of 1. Other weight assignments 
meeting the general criteria are clearly possible. The 
heuristic H^onsen^us is the only Hamiltonian term linking 
together the various binders. 

20 All Hamiltonian components change only due to the 

dependence of the interatomic distances, Ri,ij, on the rigid 
unit's torsional rotation. The R^^.j are the well known 
Euclidean distances between the atomic coordinates stored in 
the rigid unit records. Calculation of coordinate changes 

25 due to rotation of angle 0 about a bond with unit direction n 
originating at atom A with position 2c is well known, but will 
be detailed. (Throughout, symbols representing vector 
quantities are indicated by underlining.) First, translate 
from the current coordinate origin to an origin at position ^ 

30 by adding 2c to all relevant coordinate vectors. Second, 

apply a rotation matrix, T, to the atomic coordinate vectors. 
Third, translate back to the prior coordinate origin from jc 
by subtracting 2^ from all relevant coordinate vectors. A 
rotation matrix is given by: 

35 
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r=cos dp) I^nn'^ll-cos (cp) ] -^Msin(cp) 
0 



'^7 



(12) 



10 



15 



20 



A reference for this computation is Goldstein, Classical 
mechanics , Massachusetts, Addison-Wesley (1981), especially 
chapter 4, which is herein incorporated by reference. 

Type I move generation 

Type I moves alter side chain structure of a randomly 
chosen amino acid in a randomly chosen binder. These randorr. 
choices are conventionally made by a random number 
subroutine- The chosen side chain is "removed" from the 
binder peptide and "grown" back rigid unit by rigid unit. 
For the next, i'th, rigid unit to be added, K possible new 
torsional angles are generated according to p^"*^ . Preferably 
K is from 10 to 100. One of these torsional angles is 
selected according to p*'^', and the rigid unit is added at 
this new angle . Determination of p*''^ requires obtaining the 
normalization w^**^ . At each step the u^"'^ and u"^ used to 
calculate the respective probabilities include only 
interaction energies with rigid units present in other amino 
acids or already grown back. Rigid units not yet added are 
ignored. After all the side chain rigid units have been 
added back, W"** is computed as the product of the 
normalization factors . 

Fig. 12 illustrates a Type 1 move for glutamate . At 141 
the side chain has been removed. The first -CHj- unit is 
added back at 142 with new torsional angle The generation 

according to p^"^ and selection according to p**^ of this angle 
ignores energy interactions with the other side chain rigid 
\inits not yet added. At 143, the next tCH^- rigid unit is 
added back at angle Finally at 144, the last -COa rigid 



25 
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unit is added at angle 4>2' I^or this last step interaction 
energies with all the rigid units are considered in 
generating and selecting the new angle. 

yroid ^j^^ weight for the reverse move, the move from 
5 the proposed new structure to the current configuration. For 
this, the proposed side chain is removed and regrown in its 
current structure unit by unit. For the next, i'th, unit 
generate K-1 possible new torsional angles according to p^"S 
again ignoring interactions with units yet to be added. The 

10 K'th new angle is the current angle for that unit. The 
current torsional angle is selected- Although p*"^ is not 
used, normalization w^"^ is determined. After all units have 
been regrown at the current angles, W^^'* is computed as the 
product of the normalizations. 

15 The acceptance probability for the proposed side chain 

configuration is determined from equation 7 using W"*** and W°-° 

Type II move generation 

Type II moves alter a limited region of the amino acid 
20 backbone beginning at ?. randomly chosen backbone rigid unit 
of a randomly chosen binder peptide in a manner consistent 
with conformational constraints due to internal disulfide 
bonds. These random choices are made similarly to those for 
Type I moves . 

25 In Type II moves, side chains attached to the altered 

rigid units move rigidly with their backbone rigid units. 

For this move, important geometric constraints must be 
met. In a randomly chosen binder and at a randomly chosen 
backbone bond between adjacent rigid units, a torsional angle 

30 rotation by <pQ is made. Subsequent backbone torsional 

rotations are chosen so that a minimum number of rigid units 
undergo a spatial displacement. This constraint fixes a 
limited number (if any) of possible subsequent torsional 
angles as a function of so that at most 4 rigid units are 

35 spatially displaced and rotated with at most 3 additional 

rigid units undergoing a rotation. This move is an important 
aspect of this invention and is required to maintain the 
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conformational constraint due to the disulfide bridge. Since 
only 7 rigid units are spatially modified, the Type II move 
preserves the 8 amino acid cycle (2 0 rigid units) , including 
the cystine side chain. 
5 Fig. 13 illustrates a Type II move of a poly-glycine 7- 

mer. Rigid unit positions are indicated generally by black 
circles as at 1509 with incoming bonds generally as at 1502. 
A Co rigid unit (B unit) is illustrated in box 1515, and an 
amide bond (C unit) in box 1516. Bac)cbone structure 1500 in 

10 transformed into structure 1501 by the Type II move generated 
by an initial rotation about bond 1502. Subsequent rotations 
about bonds 1503. 1504, 1505, 1506, 1507, and 1508 are 
thereby determined so that the rigid unit 1510 and at most 
three subsequent units undergo only a rotation without any 

15 spatial displacement. The four rigid units between units 
1509 and 1510 undergo both a spatial displacement and a 
rotation as structure 1500 is transformed to structure 1501. 
No other backbone rigid units are altered. 

The derivation of these assertions, including 

20 expressions for the* allowed angles, is in Section B. 

Appendix: Concerted Rotation. Fig, 14 defines notation used 
in this Appendix: Concerted Rotation. Poly-glycine 7-mer 
backbone 1600 is the same as in Fig. 13. Rigid unit 
positions are indicated generally by black circles as at 1601 

25 with incoming bonds generally as at 1602. The torsional 

rotations to are about bonds 1602 to 1608, respectively, 
between sequential, adjacent rigid units. The rigid unit 
position vectors £o to illustrated as vectors 1610 to 
1616, respectively, define the position of these sequential 

30 rigid units with respect to a laboratory coordinate system 
with origin 1609* Summarizing this Appendix, the 
determination of the fixed torsional angles proceeds as 
follows. The allowed values for are the roots of equation 
34 , which depends on the 0© ciriver angle and 02 through 0^ . 

35 But 4>2 through 0^ can be determined in terms of 4>^. Two 

solutions for 4>2 ar*© determined by guation 25 in terms of 0i. 
Two solutions for are determined by equation 29 in terms of 
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the preceding Finally, a simple inversion of equation 

32 determines one solution for <t>^ in terms of the preceding 
Having found the allowed values of then equations 
25, 29, and 32 determine corresponding allowed values for the 
5 other <t>'s, which in turn determine the alteration of the 
first four rigid units caused by the <t>o initial rotation. 

More precisely, final torsional angles <^>o to determine 
position vectors r^ to by applying rotation matrix 18 to 
equations 17 to obtain new position vectors in the laboratory 

10 coordinate system, the rotation matrices of equations 16 and 
18 being determined by these final torsional angles. 
Position vectors rc and £5 to r, do not change. Then rigid 
unit 0 is translated to position aligned so that its 
incoming bond axis is along the direction of the outgoing 

15 bond of unit -1; and finally rigidly rotated so that the end 
of its outgoing bond is at position Xi- Rigid unit 1 is then 
translated to position £1; aligned so that its incoming bond 
axis is along the outgoing bond of unit 0; and rigidly 
rotated so that the end of its outgoing bond is at position 

20 £2- Rigid units 2 to 6 are then added to the backbone in a 
similar fashion. In this fashion the Type II move geometry 
is determined. Any side chains attached to these rigid units 
are rigidly rotated when their parent unit is rotated. 

The Type II rotation is chosen in the following manner. 

25 Using the conf igurational bias prescription, the Hamiltonian 
is divided into u^"^ and u"^, u^"' is preferably 0, or 
alternatively is the torsional energy associated with the 
rigid unit of interest, while u**^ includes all remaining 
interaction energies. In the previous manner, u***^ determines 

30 p^' according to which are generated K' candidate ^0 rotation 
angles. Preferably K' is 1. Then the geometric constraints 
are solved for each candidate 4>0' Typically, but not always, 
6K' , denoted R, possible backbone alterations are obtained. 
One of these is selected by p~*^, determined by: 
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^^x,ki (13) 



Jc-l 



u*" includes all interactions not in u^S that is all other 
backbone and side chain interactions. Because these 
1^ determinations occur in torsional angle space and change the 
volume element in that space, the Jacobian, determined by 
equation 35, of the selected Type II move is also needed as a 
weight in the acceptance probability for detailed balance. 
This acceptance probability for Type II moves is: 

15 

^new-' new 

accept (curr-prop) = min[l, ^^^^^^j^ l tl^> 



The weight and Jacobian of the reverse transformation 
from the proposed to the current structure are also needed in 
the acceptance probability for Monte Carlo detailed balance. 
These quantities are determined as follows- Using the 
proposed backbone structure just selected as the basis, 
generate a set of K'-l new <^>o torsional angles according to 
p^"^ and also include the current (po in the set. Then solve 
the geometric constraint to determine the permitted 
alterations. . The current conf iguration, since it exists, 
must be among the permitted structures. From this set of 
permitted structures determine W**" per equation 13. Then 
select the current configuration and compute the Jacobian iT*^** 
per equation 35. This completes the determination or the 
acceptance probability. 

Proline is approximated. Proline is not subject to Type 
I moves. However, proline is subject to normal Type II 
moves, with its side chain bond to the amino nitrogen broken. 
The side chain thus moves rigidly with its backbone rigid 
unit as in normal Type II move. To compensate for the broken 
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bond approximation, the C^^-N torsional energy amplitude in the 
proline backbone is set at approximately 5 kcal/mole. (By 
contrast the torsional energy in a typical amino acid of the 
C^-N bond is approximately 0.3 kcal/mole.) This invention is 
5 adaptable to other suitable approximations for proline. 
Alternatively/ the proline side chain may be subject to 
alterations which preserve its cyclicity, such as for 
example, by an extension of the constraint scheme just 
described. 

10 

Program detailed description 

The following describes the construction and use of a 
computer method and apparatus to perform the method of step 
5. The listing of this code is included in a microfiche 

15 appendix to this specification. Fig. 15 is a general view of 
the computer system and its internal data and program 
structures. To the left in Fig. 15 are the principal data 
structures of this method. Current structures 1701 contains 
the current structures of the N binders represented in memory 

20 as described. Proposed structure 1702 contains working 
memory areas used to generate a proposed new structure for 
one binder peptide. Structures 1701 and 1702 would typically 
be stored in RAM memory of the computer system, RAM memory 
being memory directly accessible to processor fetches. 

25 Stored structures 1703 contain similar memory representations 
of all the peptide structures generated, accepted, and 
selected for storage. This is typically stored on permanent 
disk file (s) . 

Candidate pharmacophore structures 1704 are input to the 
30 programs from either a disk file of the display and input 
unit 1712. The identified candidate structures are used to 
determine the vr'i^i^ in Eqn 11. 

Parameters 1705 comprises several parts. First, are all 
the AMBER atomic interaction definitions and parameters. 
35 Second, are standard representations of the amino acids 
including component rigid units and atomic charge 
assignments- Third, are parameters controlling the run. 
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These further comprise, by example, values for K and K' , the 
Type I /II move branching ratio, the number of moves made in 
the simulations run, the simulation total energy record, etc. 
The parameters would typically be loaded from disk file(s) 
5 into RAM memory for manipulation during a simulation run. 
Unit 1712 includes display and input devices' for 
monitoring and control. Depicted on the display are the 
total number of moves made in the current run and the course 
of the total energy, which is similar to that illustrated in 
10 Fig. 9. 

Processor 1711 is loaded with necessary programs prior 
to a simulation run and executes the programs to perform the 
simulation method. The general structure consists of main 
program 1706, structure modification program 1707, Type I and 

IS II move generators 1708 and 1709, and subroutines 1710. The 
subroutines consist of common utility subprograms, such as 
for performing torsional rotations about bonds and computing 
interaction energies by the previous methods, and 
conventional library subprograms, such as for performing 

20 input and output and finding random numbers. Any 

scientifically adequate random number generator can be used. 
A reference for random number generators is Press et al . , 
Numerical recipes: the art of scientific computing . 
Cambridge, U.K., Cambridge University Press, (1986), chapter 

25 7. The invention is equally adaptable to other program 
structures that will occur to those skilled in computer 
simulation arts. 

The preferred embodiment of these structure is an Indigo 
2 workstation from Silicon Graphics (Mountain View, CA) . 

30 Alternatively, any high performance workstation, such as 
products of Hewlett -Packard, IBM or Sun Microsystems, could 
be used. Preferably the data and program structures are 
coded in the C computer language. Alternatively any 
scientifically oriented language, such as Fortran, could be 

35 used. Conventional subroutine and scientific subroutine 
libraries are used where appropriate. 
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The program components will be now described in detail 
with reference to Figs. 16, 17, 16, and 19. Fig. 16 
illustrates main program 1706. The peptide sequences of the 
N binders are input at step 1801. All necessary AMBER 
5 parameters - bond lengths and angles, atomic types and 

charges, interaction parameters, amino acid definitions, etc. 
- are input at step 1802. Step 1803 creates initial 
structures from this input data. Rigid unit records for all 
rigid units are created and linked to represent peptides. 

10 The geometric structures of these peptides either are 

obtained from a prior run or are built by adding side chains 
to a prototypical backbone characteristic of the library of 
the binder. A prototypical backbone for the CX^C library is 
found in the microfiche appendix heading CX6C.CAR. The 

15 initial binder structures are stored in the current structure 
data areas in preparation for the beginning the main steps of 
the method. 

Step 1804 begins the main loop of the simulation with 
the generation of a proposed modified structure for one of 

20 the binder peptides by structure modification program 1707. 
As part of proposed structure generation, an acceptance 
probability, accept (curr->prop) is determined as previously 
described. The proposed structure will be accepted at 1805 
based on this probability. For example, a random number 

25 between 0 and 1 is generated, and the proposed structure 
accepted if the random number is less than the acceptance 
probability • If the proposed structure is accepted, then it 
is tested for sufficient distinctiveness at step- 1806. This 
test is met if at least one atomic position in the proposed 

30 structure differs from the corresponding position in the 
current structure by at least approximately 0.2 A. If the 
proposed structure is distinct, i^ is stored at 1807 in the 
structure store for later analysis. Whether distinct or not, 
the accepted proposed structure for the peptide replaces the 

35 corresponding current structure at step 1808. 

The simulation is tested for completion at step 1809. 
Completion can be controlled by the operator at station 1712 
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depending on display of run progress results. Alternatively, 
termination can be mechanically controlled. After completing 
a certain number of total moves after run energy 
equilibration, the moves being split between Types I and II 
5 according to the specified branching ratio, the run is 

terminated. The preferred number of total moves is 25,000, 
and the preferred Type I/II branching ratio is 4. Thus it is 
preferred to have 20,000 Type I and 5,000 Type II moves after 
equilibration per simulation run. 

10 At step 1810, the stored structures are analyzed to 

determine both the consensus pharmacophore structure and the 
structures of the remainder of the binders. In the preferred 
embodiment, atomic positions in the equilibrated stored 
structures for each peptide are averaged to obtain the 

15 predicted geometric structure. The shared pharmacophore 
structure is obtained from the predicted structure of each 
peptide, again by averaging the shared position information 
for all peptides. Alternatively, before structure averaging, 
the structures generated for each binder can be clustered 

20 into similar groups ^nd the clusters for each peptide 
separately averaged. The clusters would represent 
alternative peptide folding patterns. It is anticipated that 
because preferred binders are short peptides constrained by 
disulfide bridges, any alternative foldings identified will 

25 be structurally similar. The clustering can be done by the 
exemplary methods found in the previously referenced article 
Gordon et al. Fuzzv cluster analvsis o f molecular dynamics 
trajectories. Proteins: Structure, Function, and Genetics 
14:249-264 (1392). For all analysis methods, the choice of 

30 the preferred number of stored moves is adjusted to achieve 
adequate estimated statistical position errors. Further, 
preferably, the results of three runs are combined to achieve 
increased statistical confidence. 

Other information is also output. Particularly 

35 important is the course of the total energy for each peptide 
and for all the peptides, and the intra-molecular, consensus, 
and constraint components of the energies. These energy 
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components are used in determining whether a consensus 
pharmacophore has been found. As previously described, this 
is preferably done by insuring that HcoMen«us is small compared 
to the total energy and is minimized by a particular 
5 candidate pharmacophore. Also Hkkr must be relatively small. 

Finally at 1811, all results are output in a form usable 
for the subsequent steps 6 and 7 of Fig. 1. For example, 
this may be a particular file format suitable for subsequent 
lead compound search by a database query. 

10 Turning now to Fig. 17, structure modification program 

1707 will be described. This is invoked from the main 
program at 1804, Upon entry, this program randomly picks one 
of the binder peptides at 1901 for which to generate a 
proposed structure and also picks which type of move to use 

15 at 1902. This latter random choice is made according to an 
adjustable Type 1/11 branching ratio (preferably 4) . For a 
Type I move, step 1903 picks a random amino acid side chain 
of the selected peptide, and step 1904 invokes the Type I 
move program. (Proline has no Type I moves.) For a Type II 

20 move, step 1905 picks a random backbone bond between rigid 
units to rotate and also a random direction from the picked 
bond along which backbone rigid unit structure will be 
altered. Step 1906 invokes the Type II move program. 

Figs. ISA and 18B illustrate the Type I move generator 

25 1706, which is defined by equations G and 7. With reference 
first to Fig. ISA, the proposed structure of the selected 
peptide is created from its current structure by removing the 
selected side chain. All intra-molecular interactions are 
subsequently determined with respect to the proposed 

30 structure absent side chain rigid units not yet regrown. K 
candidate new torsional angles for the next, i'th, rigid unit 
to add are generated by Pi^^ at 2002. Preferably K is between 
10 and 100. Generation of these angles uses the conventional 
rejection method referenced in Press et al. at § 7.3. The 

35 weight w^**^ and p/*^ are determined for each of these 

candidate angles. This requires the rigid unit to be added 
to be rotated to the candidate angle using the previous 
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increments. When a root is located in a 0,04* segment, it is 
refined with the bisection method referenced in Press et al . 
at § 9.1. It is expected on the average that six K' 
solutions will be found. If no roots are found at 2103, the 
5 candidate rotation is impossible and this move is skipped. 
If solutions exist, next, at 2104, p"^ and VP^"" are determined. 
Using the described rotation method, the backbone rigid units 
are rotated (with consequent spatial displacement of 4 units) 
to a candidate torsional angle solution about their mutual 

10 bonds. Additionally, any side chains attached to backbone 
rigid units are rigidly rotated using the same method. 
Having made these rotations, candidate interatomic distances 
and candidate interaction energies can be determined and used 
to obtain p"^ for this candidate solution. One of the 

15 candidates is probabilisticly selected at 2104, and the 
backbone and any side chains are rotated according to this 
candidate into the proposed structure. The Jacobian of this 
transformation is determined at 2106 by equation 35. Lastly 
the old acceptance weight and Jacobian are determined at 

20 2107, From the weights and Jacobians the move acceptance 
probability is found for use at 1805. 

Fig. 19B details the determination 2107 of and 
for the reverse move from the proposed to the current side 
chain structure. Temporarily the proposed stmcture is used 

25 as the basis for energy determination at 2008, and the 

current structure is restored at 2016, when this process is 
finished. At 2109, a set of K' -1 candidate torsional angles 
is generated for the selected backbone bond according to p"*^ 
using the rejection method and the current torsional angle is 

30 added to this set. If as preferred, K' is 1, this step 
results in a set with only the current angle. At 2111, 
similarly to 2102, the permitted torsional rotations about 
adjacent bacJcbone bonds are determined from the equations 
expressing the concerted rotation constraints. Special care 

35 is taken to ensure that the original conformation is found by 
the root finding proc dure. In particular, the s arch 
interval is c ntered on the known original and is made as 
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small as necessary to isolate the root, which may be as small 
as 0.004*' or smaller. The current structure must be among 
these solutions, since it exists. Select it at 2112. W°^° is 
computed from the candidate angle solution, making the 
5 candidate rotations and determining candidate interactions. 
Also the Jacobian, J°", of the transformation is computed 
from the proposed to the current structure* 

5.B. CONSENSUS STRUCTURE TEST 

10 Having selected a candidate pharmacophore and determined 

a best possible consensus structure and best possible 
structures for the remainder of the binder molecules, the 
consensus test, step 6, tests whether a consensus structure 
has actually been found. A consensus pharmacophore structure 

15 consists of a spatial arrangement of chemically similar 

groups shared by all the N binders to high accuracy. Since 
an actual pharmacophore exists, the N specifically binding 
members of the screened libraries will share the actual 
structure. However, the remainder of binder molecules will 

20 share no other similar structures to such a high accuracy. 
Therefore, a structure consensus of the N binders is possible 
only if the candidate pharmacophore is the actual physical 
pharmacophore responsible for the actual binding. If the 
candidate selected relates to other parts of the binder 

25 molecules, no structure consensus will be found. Further, if 
the Monte Carlo determination attempts to impose a consensus 
on parts of the binder molecules that do not share structure, 
an inconsistent overall structure will be obtained for the 
remainder of the binder molecules* 

30 Therefore, two preferred consensus tests are applied: 

one test asks whether a consistent candidate pharmacophore 
has been obtained, and a second test asks whether consistent 
structures have been obtained for the remainder of the binder 
molecules • Both tests have a preferred absolute and a less 

35 preferred relative version. 

There are two portions for the first t St. First, are 
all the consensus pharmacophore distanc s obtain d in the N 
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resolution structure is used in step 7 to determine lead 
compounds for use as a drug that will bind to the original 
target of interest. 

Thus, one or more lead compounds are determined, that 
5 share a pharmacophore specification with the determined 

consensus pharmacophore structure. This determination can be 
preferably done by one of several methods: by a search of a 
database of potential drug compounds or of chemical 
structures (e.g., the Standard Drugs File (Derwent 

10 Publications Ltd., London, England), the Bielstein database 
(Bielstein Information, Frankfurt, Germany or Chicago), and 
the Chemical Registry (CAS, Columbus, OH)) to identify 
compounds that contain the pharmacophore specification; by 
modification of a known lead compound to include the 

15 pharmacophore specification; by synthesizing a de novo 

structure containing the pharmacophore specification; or by 
modification of binders to the target molecule (e.g., 
isolated in step 2) outside of the pharmacophore structure to 
render the binder more attractive for use as a drug (e.g., to 

20 increase half-life, , solubility, ability to achieve desired in 
vivo localization) . 

Database search queries are based not only on chemical 
property information but also on precise geometric 
information. Computer-based approaches rely on database 

25 searching to find matching templates; Y.C. Martin, Database 
searching in dru g design . J. Medicinal Chemistry, vol. 35, pp 
2145-54 (1992), which is herein incorporated by reference. 
Existing methods for searching 2-D and 3-D databases of 
compounds are applicable to this step. Lederle of American 

30 Cyanamid (Pearl River, New York) has pioneered molecular 

shape -searching, 3D searching and trend- vectors of databases. 
Commercial vendors and other research groups have enhanced 
searching capabilities [MACSS-3D, Molecular Design Ltd. (San 
Leandro, CA) ; CAVEAT, Lauri, et al.. University of 

35 California (Berkeley, CA) ; CHEM-X, Chemical Design, Inc. 
(Mahwah, N. J. ) ) . 
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The pharmacophore structure determined in this invention 
is adaptable to any of these methods and sources of chemical 
database searching and to the enumerated non-database 
methods. Output will be lead compounds suitable for drug 
5 design. An important aspect of this invention is that the 
high resolution pharmacophore structure will lead'to highly 
targeted leads. Lower resolution structures result in a 
geometric increase in the number of lead compound query 
matches. Example 1 illustrates this effect. 

10 

5.10. APPENDIX: CONCERTED ROTATION 

Since the preferred molecules under consideration are 
conf ormationally constrained by disulfide bridge (s), a Monte 
Carlo move that preserves this constraint is required. The 

15 "concerted rotation" scheme used for alkanes can be extended 
to allow rotation of the torsional angles in conf ormationally 
constrained peptides. This appendix describes this 
extension, Dodd et al . (1393) discusses the original, 
restricted method. (The essential extensions are expressed 

20 in equations 27, 28, and 34.) This method is directly 
applicable to the cyclic residue of proline, and an 
alternative embodiment of this invention would thermally 
perturb proline with a move of similar geometric constraints. 
Fig. 14 illustrates the geometry under consideration. 

25 Illustrated backbone 1600 is a poly-glycine 7-mer, Rigid 

unit positions are indicated generally by black circles as at 
1601 with incoming bonds generally as at 1602. The torsional 
rotations 4>o to 4>6 about bonds 1602 to 1608, respectively, 

between sequential, adjacent rigid units. The rigid unit 

30 position vectors £o to x«# illustrated as vectors 1610 to 
1616, respectively, define the position of these sequential 
rigid units with respect to a laboratory coordinate system 
with origin 1609. A C^ rigid unit (B unit) is illustrated in 
box 1630, and an amide bond {C unit) in box 1631. 

35 To formulate this method, let us consider rotating about 

seven torsional angles, which will displace the root 
positions and rotate four rigid unite, rotate up to three 
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additional ones, and leave the rest of the peptide fixed. 
The root position of a rigid unit is the C^, position for a B 
unit, the C position for a C unit, the C position for a CHj 
unit, and the S position for the S unit in cystine. If unit 
5 5 is a C unit, however, rc is defined to be the backbone amino 
nitrogen position of that unit. For each unit, Ifet us define 

to be the fixed angle between the incoming and outgoing 
bonds. Thus, 6, = 0 for a C unit, and 6, - 70.5° for all 
others . 

10 The method leaves the positions r^ of units i <. 0 or i 

5 fixed. The torsion <t>o is changed by an amount 60^. The 
values of 1 < i 6 are then determined so that only the 
positions r^ of units 1 <. i <. 4 are changed. 

The method requires several definitions to present the 

15 solution for the new torsional angles. The bond vectors are 
defined to be the difference in position between unit i and 
unit i - 1, as seen in the coordinate system of unit i: 

(15) 

20 

Bond vectors l^ to are illustrated in Fig. 14 at 162 0 to 
1624, respectively. The length and orientations of the 1^ are 
determined by rigid unit structure and the length and angle 
AMBER parameters for bonds between atom types. The 

25 

coordinate system of i is such that the incoming bond is 
along the ^ direction. Thus ii « Ij ^ if atoms ri and r^.j 

are directly bonded to each other and has x- and y- components 

30 

otherwise. Here 2C is a fixed unit vector along the x 

direction. Now define a rotation matrix that transforms from 
the coordinate system of unit i+1 to unit i 

35 
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cose. sinSj 0 

sin9jCos(t)j- -cosevCos4)j sin4>j 
sin8,sin<J)^ -cose.sirv^<{> -cos4>.; 



(16) 



The positions of the units in the frame of unit 1 are, thus, 
given by; 



10 



(17) 



15 



20 



Further define the matrix that converts from the frame 
of reference of unit 1 to the laboratory reference frame 

Tj*" = [cosT^I ^nn"^ (1-cos^) *Msin^]A. 

where 



(18) 



M = 



25 



0 JJy 

n, 0 -n. 



(19) 



and 



30 



35 



n = 

cosijr = 



1x1 lil 

I (X X i) 
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where r is the axis of the bond coming into unit i. The 
matrix A is a rotation about ^ and is defined so that 



10 



where 



15 



'1 0 0 ^ 
0 c -s 

,0 s c , 



c 

5 



(l,,,Ai^.-l,,Ar,)/(Ax,NAx/) 
(-I'l^Aiy ^IjyAr J / (Aiy' +Ar^ 



(20) 



(21) 



Here AB = A[T^*^] iz ^"Z if unit 0 is a C unit. Otherwise, 



20 The method proceeds by solving for c^^, 2 <.i < 

analytically in terms of Then a nonlinear equation is 

solved numerically to determine which values of 0i, if any, 
are possible for the chosen value of ^o- 

The derivation proceeds in the coordinate system of unit 

25 1, after it has been rotated by the chosen 0o- Define 



(22) 



If 63 ^ 0 and 85 ^ 0, one can see from Fig. 14 that the 
distance between unit 3 and unit 5 is known and equal to 

2 (I^^cos0^-l^ysine^+l3^)2 + 
" (l^^sin6,+l^yCose^+l5y)2 



(23) 



But this distance can also be written as 



35 
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(24) 



Equating these two results, two values of 4>2 are possible 
4>1 = arcsin(Ci) - arctan (x^/x^) - // [x^) 



<P2 



71 -arcsin(Cj) - arctan(x^/x^) - H [x^) , 



(25) 



10 



with 



0, x>0 
x<0 



(26) 



25 The constant Cj is given by 



20 



25 



-2 (sine^Jj^-cose^Jj ) (x/*;>f|) 



(Xg-X;) -(X^-X;) /it-i5-J4^cose^-x^(cose^J3^-^sine;J3y} 



. 63*0,65- 



i3^cos9« -x^ (cose^ J3^^sin9,i3^) ^ ^ 



(27) 



30 



where x is given by Egn. 24 if 65 0, and x = irMli***] "M£e - 
Is)/1« if fij « 0. Clearly for there to be a solution |ci| < 1. 
The last three equations for Cj were determined by conditions 
similar to equating Eqns. 23 and 24. For e, s 0, 65 *« 0, the 
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X component of r^*^' - r^'^* is known to be equal to (l^,, ^ 
Ijcosej . For 63 9« 0, 65 = 0, the x component of x^*** - r,*^' is 
known to be equal to 1^^ + l^^cosG^. For 83 = 0, = 0, the 
angle between £3 - r^ and r^ - r^ is known to be equal to . 
5 To determine 03 two expressions for jr, - r^|^ are again 

equated to determine that: 

c, = -^s'y'--^'-2yx(cose3j,,.sine3i,,) 



10 



<|»3 = arcsin(Cj) - arctan (y^/y^) - H (y,) 
4>r = 71 -arcsintCj) - arctan (y^y^) - H (y^) , 



(29) 



15 



where y= rV (T^'x-lj) -i^ . . Again, |cj| < 1 for there to be 



a solution. 

It Qf, * 0, the value of 0« can now be determined from: 

20 

■ = ^T^TjTjT.ij. (30) 

Defining 

flj = T j't -/t (T \"') -Mx, - X, ) . (31) 

25 

the equations that define are given by 



q^y = cos<l)<{sine<i5j, - cose^IsP 
g,, = sin4»<(sine«J5, - cosB^Js^) 



(32) 



30 

This is a successful rotation if the position of ^ is 
successfully predicted. That is, the equation 

X',^'-X<5" = T,T2T3T,T5l^ = [T\'^)-Mx,-Xs) • <33) 

35 

must be satisfied. Consider the x-component, which implies 
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^^r^,)'izr^,)-2,i,cose,^o, 63-0,65=0 <3'»> 

k --Cj - [ ( ifix^isx) ' * J/y] ^^^ = 0 , e, =0 . 6, =0 



roust be satisfied if the rotation is successful. The 
equations for the case 6^ = 0 clearly express the geometric 
conditions required for a successful rotation. 

Eqn. 34 is the nonlinear equation for 0, because 0„ 0^, 
and 0, are determined by Eqns. (25), (29), and (32) in terms 
of 0,. This equation has between zero and four values for 
each value of 0,, however, due to the multiple root character 
of Eqns. (25) and (29). The equation is solved by searching 
the region -tt < 0 < tt for zero crossings. The search is in 
increments of - 0.04°. These roots are then refined by a 
bisection method. 

The transformation from 0., 0 < i < g to the new solution 
which is constrained to change only r,. 1 < i < 4 actually 
implies a change in volume element in torsional angle space. 
This change in volume element is the reason for the 
appearance of the Jacobian in the acceptance probability. 
The Jacobian of this transformation is calculated in Dodd et 
al. (1993)at pp. 991-93. It is slightly different here since 
root position is not necessarily the head position. The 
Jacobian is given by. 



25 



30 



35 



where the 5 x 5 matrix B is given by B,, » [jj, x - fc,)], for 

- ^ - - Ss)/|j:. - £slJ.-, for i « 4,5. Here h 

= £1. except that h, is the head position even if = 0, and' 
Hi is the incoming bond vector for unit i. 
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Repeated application of the concerted rotation may lead 
to a slightly imperfect structure, due to numerical precision 
errors. In an alternative embodiment, peptide geometry would 
be restored to an ideal state by application of the Random 
5 Tweek algorithm after several thousand moves (Shenkin et al,, 
1987, Biopolymers 26:2053-85). 

The invention is further described in the following 

examples which are in no way intended to limit the scope of 

the invention. 

10 6 . EXAMPLES 

6.1. RELATION BETWEEN EFFECTIVENESS OF 
POTENTIAL DRUG IDENTIFICATIONS AND 
PHARMACOPHORE GEOMETRIC TOLERANCE 

Searches of a drug library well known to medicinal 

chemists, the Standard Drugs File (Derwent Publications Ltd , 

15 

London, England) , illustrate the geometric increase in the 
number of compounds found (and thus decrease in expected 
effectiveness of identification of potential drugs) as 
pharmacophore geometric tolerance is increased. Table 4 
tabulates the results. 

20 

Table 4 



25 



1 5HT3 (5 Hydroxytryptophan) 


1 Tolerance (A) 


Number of drug compounds 


2.0 


64 


1.0 


35 


0.5 


27 


0.25 


12 


0.10 


1 



35 
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Dopamine 


Tolerance (A) 


Number of drug compounds 


2.0 


188 


1.0 


185 


0.5 


60 ' 


0.25 


4B 


0.10 


5 



10 The pharmacophores are two well known neurotransmitters, 
5 -hydroxy tryptophan and dopamine. As the tolerance of one 
distance in the pharmacophore structure is decreased from 2.0 
to 0.1 A, the number of compounds retrieved from the database 
is listed. The advantage of achieving pharmacophore 

15 resolution better than approximately 0,25 A is clear. 

If the tolerance of three distances were involved, the 
expected number of compound retrieved would be the cube of 
these numbers. For the dopaminergic pharmacophore, the 
number of lead compounds would decrease from over 6.5x10' to 

20 about 125 as three tolerances were decreased from 2 . 0 A to 
0.1 A. 

This example illustrates the geometric increase in the 
number of leads identified as pharmacophore geometry is less 
well defined. It thus a very preferred .aspect of this 
25 invention that the computational method results in 

determining pharmacophore structure accurate to at least 
approximately 0,25 to 0.30 A. Thus an exponentially large 
improvement in lead compound selection for drug design can be 
expected to result from this invention - 

30 

6.2. EXPRESSION AMD PURIFICATION 
OF TARGET PROTEINS 

Target molecules that are proteins, for example ras, 

raf , vEGF and KDR, are expressed in the Pichia pastoris 

expression system (Invitrogen, San Diego, CA) and as 

glutathione-S-transf erase (GST) -fusion proteins in coli 

(Guan and Dixon, 1991, Anal, Bioch m, 192:262-267). 
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The cDNAs of these target proteins are cloned in the 
Pichia expression vectors pHIL-Sl and pPIC9 (Invitrogen) , 
Polymerase chain reaction (PGR) is used to introduce six 
Histidines at the carboxy- terminus of these proteins, so that 
5 this His-tag can be used to affinity-purify these proteins. 
The recombinant plasmids . are used to transform Pichia cells 
by the spheroplasting method or by electroporation . 
Expression of these proteins is inducible in Pichia in the 
presence of methanol. The cDNAs cloned in the pHIL-Sl 

10 plasmid are expressed as a fusion with the PHOl signal 

peptide and hence are secreted extracellularly . Similarly 
cDNAs cloned in the pPlC9 plasmid are expressed as a fusion 
with the Qf-factor signal peptide and hence are secreted 
extracellularly. Thus, the purification of these proteins is 

15 simpler as it merely involves affinity purification from the 
growth media. Purification is further facilitated by the 
fact that Pichia secretes very low levels of homologous 
proteins and hence the heterologous protein comprises the 
vast majority of the protein in the medium. The expressed 

20 proteins are affinity purified onto an affinity matrix 

containing nickel. The bound proteins are then eluted with 
either EDTA or imidazole and are further concentrated by the 
use of centrifugal concentrators. 

As an alternative to the Pichia expression system, the 

25 target proteins are expressed as glutathione-S-transf erase 
(GST) fusion proteins in E. coli. The target protein cDNAs 
are cloned into the pGEX-KG vector (Guan and Dixon, 1991, 
Anal. Biochem. 192:262-267) in which the protein of interest 
is expressed as a C*terminus fusion with the GST protein. 

30 The pG£X-KG plasmid has an engineered thrombin cleavage site 
at the fusion junction that is used to cleave the target 
protein from the GST tag. Expression is inducible in the 
presence of IPTG, since the GST gene is under the influence 
of the tac promoter. Induced cells are broken up by 

35 Bonication and the GST- fusion protein is affinity purified 
onto a glutathione- linked affinity matrix. The bound 
protein is then cleaved by the addition of thrombin to the 
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affinity matrix and recovered by washing, while the GST tag 
remains bound to the matrix. Milligram quantities of 
recombinant protein per liter of £. coli culture are expected 
to be obtainable in this manner. 

5 

6.3. SYNTHESIS AND SCREENING OF POLYSOME -BASED 
LIBRARIES ENCODING RANDOM CONSTRAINED 
PEPTIDES OF VARIOUS LENGTHS 

6.3.1. PREPARATION OF DNA TEMPLATES 

DNA libraries with a high degree of complexity are made 
as two components; an expression unit, and a semi -random (or 
degenerate) unit. The expression unit has been synthesized 
chemically as an oligonucleotide (termed T7RBSATG) , and 
contains the promoter region for bacteriophage T7 RNA 
polymerase, a ribosome binding site, and the initiating ATG 
codon. The random region, also synthesized as an 
oligonucleotide (termed MMN6 ) contains a region complementary 
to the expression unit, the antisense version of the codons 
specifying Cys-X^-Cys, and a restriction site (BstXI) . The 

2Q library is constructed by annealing 100 pmol of 
oligonucleotide T7RBSATG [having the sequence 
5 ' ACTTCGAAATTAATACGACTCACTATAGGGAGACCACAACGGTTTCCCTCCAGAAAT 
AATTTTGTTTAACTTTAACTTTAAGAAGGAGATATACATATGCAT3 ' 
(SEQ ID N0:2)); and oligonucleotide MNN6 [having the sequence 

25 5 ' CCCAGACCCGCCCCCAGCATTGTGGGTTCCAACGCCCTCTAGACA (MNNJ ^ACAATG 
TATATCTCCTTCTT3 • (SEQ ID N0:3); M = A or C , N = G, A, T, or 
C] , and extending the DNA in a reaction mixture containing 
10-100 units of Sequenase (United States Biochemical Corp., 
Cleveland, OH), all four dNTPS (at 1 mM) , and 10 mM 
dithiothreitol for 3D min at 37«*C. The extended material is 
then digested with BstXI, ethanol precipitated and 
resuspended in water. This fragment of DNA is then ligated 
via the BstXI end to a 250 base pair (bp) , PCR-amplif ied 
Glycine-Serine coding fragment derived from gene III of M13 
bacteriophage DNA. The gene III fragment has been amplified 
by use of two primers, respectively termed FGSPCR [having the 
sequence S ' TCGTCTGACCTGCCTCAACCTCCCCACAATGCrrGGCGGCGGCTCTGGTS ' 
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(SEQ ID NO: 4)], and RGSPCR (having the sequence 
5 ' ATCAAGTTTGCCTTTACCAGCATTGTGGAGCGCGTTTTCATCa ' 
(SEO ID NO: 5)], and Taq DNA polymerase (Gibco-BRL) . The 
amplified DNA (250 bp) was cut with BstXI to yield a 200 bp 
5 fragment that has been gel purified. The 200 bp fragment is 
then ligated to the random peptide coding DNA fragment- This 
DNA specifies the synthesis of a peptide of the sequence Met- 
His-Cys- (X) t-Cys- (SEQ ID N0:6) fused to the Gly-Ser rich 
region of the M13 gene III protein. The Gly-Ser rich domain 

10 is thought to behave as a flexible linker and assist in 

presentation of the random peptide to the target molecules. 

To make constrained random peptides of different 
lengths, oligonucleotides are made that are similar to MNN6 , 
except that the degenerate region is 5, 7, 8, and 9 codons 

15 long. In addition, oligonucleotides are made that code for 
various shapes of constrained random peptides by specifying 
sequences comprising three cysteine residues interspersed 
between 6-10 randomly specified amino acids. 

20 6.3.2. J/7 VITRO SYNTHESIS AND 

ISOLATION OF POLYSOMES 

An E. coli S3 0 extract is prepared from the B strain 

SL119 (Promega) . Coupled transcription-translation reactions 

are performed by mixing the S30 extract with the S30 premix 

25 (containing all 20 amino acids) , the linear DNA template 
coding for peptides of random sequences (prepared as 
described in Section 6.3.1 above), and rifampicin at 20 
Mg/nil. The reaction is initiated by the addition of 100 
units of T7 RNA polymerase and continues at 37^C for 30 min. 
The reaction is terminated by placing the reactions on ice 
and diluting them 4- fold with polysome buffer (20 mM Hopes - 
NaOH, pH 7.5, 10 mM MgClj, 1.5 fig /ml chloramphenicol, 100 
pig/ml acetylated bovine serum albumin, 1 mM dithiothreitol , 
20 units/ml RNasin, and 0.1% Triton X-100) . Polysomes are 

j5 isolated from a 50 fil reaction programmed with 0.5-1 /ig of 
linear DNA template specifying the synthesis of random 
constrained peptides. To isolate polysomes, the diluted S30 
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10 



15 



20 



25 



30 



35 



reaction mixtures are centrifuged at 288,000 X g for 30-40 
min at 4**C. The pellets are suspended in polysome buffer and 
centrifuged a second time at 10,000 X g for 5 min to remove 
insoluble material. 

6.3.3. AFFINITY SELECTION/ SCREENING OF POLYSOMES 
The isolated polysomes are incubated in microtiter wells 
coated with the target proteins. Microtiter wells are 
uniformly coated with 1-5 /xg of 6-His tagged, or glutathione 
S- transferase fused, target proteins (see Section 6.2 
hereinabove) . Target proteins that are used include the 
oncoproteins ras and raf , KDR (the vascular endothelial 
growth factor (vEGF] receptor protein) and vEGF. The 
microtiter wells are coated with 1-5 ^g-cf these target 
proteins by incubation in PBS (phosphate-buffered saline; 10 
mM sodium phosphate, pH 7.4, 14 0 mM NaCl, 2 . 7 mM KCl) , for 1- 
5 hours at 37*^C. The wells are then washed with PBS, and the 
unbound surfaces of the wells blocked by incubation with PBS 
containing 1% nonfat mil)c for 1 hr at 37«'C. Following a wash 
with polysome buffer, each well is incubated with polysomes 
isolated from a single 50 m1 reaction for 2-24 hr at 4*0, 
Each well is washed five times with polysome buffer and the 
associated mRNA is eluted with polysome buffer containing 20 
xtM EDTA, 

After affinity selection of the polysomes, the 
associated mRNAs are isolated, and treated with 5-10 units of 
DNase I (RNase-free; Ambion) for 15 min at 37**C after 
addition of MgClj to 40 mM. The mRNA is phenol -extracted and 
ethanol -precipitated and dissolved in 20 fil of RNase-free 
water. A portion of the mRNA is used for cDNA preparation 
and subsequent amplification using 15 pmol each of primers 
RGSPCR [ 5 ' ATCAAGTTTGCCTTTACCAGCATTGTGGAGCGCGTTTTCATC3 ' 
(SEQ ID NO: 5)], and SELEXFl 

15 'ACTTCGAAATTAATACGACTCACTATAGGGAGACCACAACGGTTTCC3 ' 
(SEQ ID NO: 9)] and rTth Reverse Transcriptase RNA PGR kit 
(Perkin Elmer Cetus) . Specifically, the mRNA is reverse- 
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transcribed into cDNA in a 20 ^1 reaction containing 1 pg 
mRNA, 15 pmol of RGSPCR primer, 200 /iM each of dGTP, dATP, 
dTTP, and dCTP, 1 mM MnClj, 10 mM Tris-HCl, pH 8,3, 90 tnM KCl , 
and 5 units of rTth DNA polymerase at 70°C for 15 min. In 
5 the next step, the cDNA is amplified by the addition of 2.5 
mM MgClj, 8% glycerol, 80 mM Tris-HCl, pH 8.3, 125' mM KCl, 
0.95 mM EGTA, 0.6% Tween 20, and 15 pmol of the SELEXFl 
primer. The reaction conditions that are employed are 2 min 
at 95**C for one cycle, 1 min at 95**C and 1 min at 60°C for 35 

10 cycles, and 1 min at 60**C for one cycle. The amplified 
product is then gel -purified and guantitated by 
spectrophotometry at 260 nm. A portion of the amplified DNA 
is digested with Nsil and Xbal and the resulting 30 base pair 
fragment is direccionally cloned into a monovalent phage 

15 display vector. The DNAs inserted in the monovalent phage 
display vector are then sequenced to determine the identity 
of the peptides that were selectively retained by one cycle 
of affinity binding to the target protein. A second portion 
(0.5-1 /19) of the amplified DNA is subjected to another cycle 

20 of affinity selection, mRNA isolation, cDNA amplification, 
and cloning. 

6*4. PHAGEMID SCREENING 
Three different protocols for screening of a phagemid 
25 library are presented in the subsections hereinbelow. These 
protocols, particularly the immobilization and binding steps, 
are readily adaptable to use for screening of different 
libraries, e.g., polysome libraries. Preferably, different 
methods are used in different rounds of screening. 

30 

6.4.1. PIATE. PROTOCOL 
In this example, a protocol is presented for screening a 
phagemid library, in which in the first round of screening, a 
35 biotinylated target protein is immobilized (by the specific 
binding b tween biotin and streptavidin) on a streptavidin 
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coated plate. The immobilized target protein is then 
contacted with library members to select binders. 

Reagents Used: 

5 Purified target protein, microfuge tubes, Falcon 2059, 
Binding Buffer, Wash Buffer, Elute Buffer, phage display 
Library of >10^^ pfu/Screened Target, fresh overnight cultures 
of appropriate host cells, LB Agar plates with antibiotics as 
needed, biotinylating agent NHS-LC-Biotin (Pierce Cat. 
10 #21335), streptavidin, 50 mM NaHC03 pH 8.5, 1 M Tris pH 9.1, 
M280 Sheep anti-mouse IgG coated Dynabeads (Dynal) , phosphate 
buffered saline (PBS), Falcon 1008 petri dishes. 

Wash Buffer = IX PBS (Sigma Tablets) , 1 mM MgClj, 1 mM CaCl^, 
15 0.05% Tween 20; (For one liter: 5 PBS tablets, 1 ml 1 M MgClj, 
1 ml 1 M CaClj, 0.5ml Tween 20, nanopure K^O to 1 liter). 

Binding Buffer = Wash Buffer with 5 mg/ml bovine serum 
albumin (BSA) . 

20 

Elute Buffer = 0.1 N HCl adjusted to pH 2,2 with glycine: 
1 mg/ml BSA. 

Procedure: 
25 Protein Biotinylation: 

1. Wash 50-100 Mg target protein in 50 mM NaHCOj pH 8.5 
in a Centricon (Amicon) of the appropriate molecular weight 
cut-off . 

2. Bring the total volume to 100 /xl with 50 mM NaHCO^ pH 
30 8.5. 

3. Dissolve 1 mg of NHS-LC-Biotin in 1 ml HjO. Do not store 
this solution- 

4. Immediately add 37 fil of the NHS-LC-Biotin solution to 
the target protein and incubate for 1 hr at room temperature 

35 (RT) . 



- Ill - 



wo 96/30849 



PCTAJS96/04229 



5 . Remove the unreacted biotin by washing 2X PBS in a 
Centricon (Amicon) of the appropriate molecular weight 
cutoff. Store the biotinylated protein at 4°C. 

5 Coating a 100 B Plate with Streptavidin: 

6. The night before the binding experiment precoat a 1008 
plate with streptavidin. 

7. Add 10 tig of streptavidin (1 mg/ml HjO) per 1 ml of 50 mM 
NaHCOj pH 8.5. 

10 8. Add 1 ml of this solution to each plate and place in a 
humidified chamber overnight at 4°C. 

Prebinding; Blocking Non-Specific Sites: 

9. To a streptavidin coated plate add 400 of Binding 
15 Buffer (BSA blocking) for one hour at room temperature. 

10. Rinse wells six times with Wash Buffer by slapping dry 
on a clean piece of labmat . 

Binding; Specific Target/Phage Complexee Round 1: 

20 11. Add 10 fig of biotinylated target protein in 400 /il of 
Binding Buffer to the well and incubate for 2 hr at 4«C. 

12. Add 4 ^1 of 10 mM biotin and swirl for 1 hr at 4**C. 

13. Wash as in step 10. 

14. Add concentrated phage library (>10" pfu) in 400 fil of 
25 Binding Buffer and swirl overnight at 4^C. 

Washing and Elution: 

15. Slap out binding mixture and wash as in step 10, 

16. To elute bound phage add 400 /xl of Elution Buffer and 
30 rock at RT for 15 min. 

17. Transfer the elution solution to a sterile 1.5 ml tube 
which contains 75 /il of i M Tris pH 9.1. Vortex briefly. 

Anqplification of Round 1 Bluted Phage: 

35 18. Plate all of the eluted round 1 phage by adding 157 /xl 
of phage to 200 /xl of cells incubated overnight (previously 
ch eked free of contamination) in three aliquots. Incubate 
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25 min in a 37 =c water bath and then spread onto LB 
agar/antibiotics place containing 2% glucose. 
19. Scrape plates with 5 ml of 2XyT (growth broth)/ 
Antibiotics/Glucose and leave swirling for 30 min at RT. 
5 20. Add the appropriate amount of 2XyT/Antibiotics/Glucose 
to bring the O.D. 600 down to 0.4 and then grow at 37»C at 
250 rpm until the O.D. 600 reaches 0.8. 

21. Remove 5 ml and add to it 1.25 x 10" M13 helper phage. 

22. Shake 30 min at 150 rpm and then 30 min at 250 rpm at 
10 37»C. 

23. Centrifuge 10 min at 3000 X g at RT. 

24. Resuspend cells in 5 ml 2XyT with no glucose. {This step 
removes glucose) • 

25. Centrifuge as in step 23 and resuspend in 5 ml 2XYT with 
15 kanamycin and the appropriate antibiotics (no glucose). Spin 

18 hr at 37«C and 250 rpm. 

26. Pellet cells at 10,000 X g and sterile filter the phage 
containing supernatant which is now ready for round 2 
screening . 

20 27. Titer the round 1 elated phage stocks. 

Binding; Specific Target/Phage Coaplexea Rounds 2-5: 

6. Combine -i ng of biotinylated target protein with the 
eluted and titered round 1 phage (10» pfu) in 200 ^1 of 

25 Binding Buffer and rock 4 hr at 4<»C. 

7. The night before the round 2 screening is started 
prewash 200 ^l/target protein to be screened of sheep anti- 
mouse IgG magnetic beads (M280 IgG Dynabeads) with 2X 1 ml of 
Wash Buffer using the Dynal Magnet. Let the beads collect at 

30 least 1 min before removing the buffer. Let the beads stand 
15 sec to allow residual Binding Buffer to collect and remove 
with a P200 Pipetman. 

8. Resuspend the washed beads in 200 fil of Binding Buffer 
and add 100 ^1 of mouse anti-biotin IgG. (Jackson IRL) . Rock 

35 overnight at 4"C. 

10. wash the unbound anti-biotin IgG from the Dynabeads by 
placing th m on the Dyna magnet for at least 1 min and remove 

- 113 - 



6? 



AS 



^V9 



r\9 



1^ 



6^ 
V" 



,e- .^ec- .e^^^ .00 ^ ^^^^^ .0 



aft* 
^^e ve^ 



go* 



DP* 



,,1^^ ' ^e«^°- .^.o-*'' it*' 



.0^ 



6^ . o 



6^ *e VV^* 



^t»^ 



6t*^ 



wo 96/30849 



PCT/US96/04229 



Amplification of Round 2-5 Eluted Phage: 

15a. Plate 10 ^1 and 100 /il of round 2,3,4 eluates using 
200 /il of contamination free {previously tested) E. calx 
XLlBlue cells onto each plate containing 
5 tetracycline/ampicillin/glucose and tetracycline/ampicillin 
and amplify as in Steps 17-25. 

6*4. 2. BIOTIN-ANTIBIOTIN laG BEAD PROTOCOL 
In this example, a protocol is presented for screening a 
10 phagemid library, in which a biotinylated target protein is 
immobilized (by the specific binding between anti-biotin 
antibodies and biotin) on a magnetic bead containing anti- 
biotin antibodies on the bead surface. The immobilized 
target protein is then contacted with library members to 
15 select binders. 

Reagents Used: 

M280 Sheep anti-Mouse IgG coated Dynabeads (Dynal) 

20 Binding; Specific Target/Phage Complexes Round 1: 

6, Combine 10 pig of biotinylated target protein with the 
phage library (>10^*^ pfu) in 400 ^1 of Binding Buffer and rock 
overnight at 4*»C. 

7. That same night prewash 50 ^1 sheep anti-mouse IgG 

25 magnetic beads (M280 IgG Dynabeads) with SCO ^1 of Binding 
Buffer twice using the Dynal Magnet. Let the beads collect 
at least 1 min before removing the buffer. Let the beads 
stand 15 sec to allow residual binding buffer to collect and 
remove with a P200 Pipetman. 

30 8. Resuspend the washed beads in 100 /xl of Binding Buffer 
and add 33 (il of mouse anti-biotin IgG {4 0 ^xg, Jackson IRL) • 
Rock overnight at 4*C. 

9. Remove unbound protein from the phage/protein reaction 
in Step 6 with a Microcon 100. Spin at 800 X g until 
35 exclusion volume is met and wash twice with Wash Buffer 

(again at 800 X g) . Collect phage/protein with a Pipetman and 
add an additional 50 /il of Wash Buffer to the Microcon, 
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Amplification of Roxind 1 Eluted Phage: 

17. Plate all of the eluted round 1 phage by adding 157 pi 
of phage to 200 ml of cells incubated overnight (previously 
checked to be free of contamination) in three aliquots. 
S Incubate 25 min in a 37»C water bath and then spread onto LB 
agar/antibiotics plate containing 2% glucose. Place plates 
upright in 37«'C incubator until dry and then invert and 
incubate overnight . 

IB. Scrape plates with 5 ml of 2XYT/Antibiotics/Glucose and 
10 leave swirling for 30 min at RT. 

19, Add the appropriate amount of 2XYT/Antibiotics/Glucose 
to bring the O.D. 600 down to 0.4 and then grow at 37'>C at 
250 rpm until the O.D. 600 reaches 0.8. 

20. Remove 5 ml and add to it 1.25 x 10" M13 helper phage. 
15 21. Shake 3 0 min at 150 rpm and then 30 min at 250 rpm at 

370c, 

22. Centrifuge 10 min at 3000 X g at RT. 

23. Resuspend cells in 5 ml 2XYT with no glucose. (This step 
removes glucose) 

20 24 . Centrifuge as in step 23 and resuspend in 5 ml 2XYT with 
kanamycin and the appropriate antibiotics (no glucose) . Spin 
18 hr at 37'>C and 250 rpm. 

25. Pellet cells at 10,000 xg and sterile filter the phage- 
containing supernatant which is now ready for round 2 
25 screening. 

Binding; Specific Target/Phage Complexes Round 2, 3, & 4: 

6a. Bind 1 /xg of target protein with 100 fil of anplified 
phage from the previous round as before, overnight at A^C. 
30 7a. Prepare the IgG anti biotin/anti IgG beads as in Steps 
7-10 using, however, only 20 fil of sheep anti-mouse IgG and 
13 /xl of anti-biotin IgG. 

8a. All other binding procedures are identical with Steps 6- 
11. 
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Washing and Elution: 

9a. Place the binding reaction into the Dynal magnet and let 
sit for 1 min. 

10a. Remove the solution and discard using a PIOOO Pipetman. 
5 Let the beads stand 30 sec to allow residual Binding Buffer 
to collect and remove with a P200 Pipetman. 

11a. Remove the tube from the magnet and resuspend the beads 
in 750 /il of Wash Buffer and return to the magnet. Again let 
the beads pellet by waiting 1 min. 
10 12a. Remove the wash solution as in Step 11a and repeat this 
process 3 more times. 

13a. After the removal of the fourth wash, resuspend the 
beads and transfer them to a fresh, labeled tube and wash 4 
more times. 
15 14a. Elute and neutralize as in Step 15. 

Amplification of Roxmds 2, 3, & 4 Eluted Phage: 

15a. Plate 10 /il and 100 pi of round 2,3,4 eluates and 
amplify as in Steps 17-25. 

20 

6. 4 .3, BIOTIN-STREPTAVIDIN, MAGNETIC 
BEAD PROTOCOLS 

In this example, a protocol is presented for screening a 

phagemid library, in which a biotinylated target protein is 

2g immobilized (by the specific binding between biotin and 

streptavidin) on a streptavidin coated magnetic bead. The 

immobilized target protein is then contacted with library 

members to select binders. 

Reagents Ueed: 

Purified target protein, M2B0 streptavidin coated Dynabeads 
(Dynal) 

Binding; Specific Target/Phage Cozzgplexes Round 1: 

2g €. Combine 10 /xg of biotinylated target protein with the 
phage library (>10*** pfu) in 400 /xl of Binding Buffer and rock 
overnight at 4®C. 
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7. Remove unbound protein with a Microcon 100. Spin at 
800 X g until exclusion volume is met, and wash twice with 
Wash Buffer (again at 800 X g) . Collect phage/protein with a 
Pipetman and add an addition 50 ^1 of Wash Buffer to the 

5 Microcon, gently titrate and combine with the first fraction 
to ensure maximal recovery. 

8. Prewash 50 fil (per reaction) of streptavidin magnetic 
beads (M280 streptavidin Dynabeads) twice with 500 /zl of 
Washing Buffer using the Dynal magnet, 

10 9. Add the prewashed Dynabeads to the protein/ghage fraction 
(add Binding Buffer to a total of 500 fil) and rock for 30 min. 
Ensure that the beads mix thoroughly with the phage/protein 
solution. 

15 Washing and Elution: 

10. Place the binding reaction into the Dynal magnet and let 
sit for 1 min. 

11. Remove the solution using a PIOOO Pipetman and discard. 
Let the beads stand 15 sec to allow residual Binding Buffer to* 

20 collect and remove with a P200 Pipetman. Note that serial 
dilution depends upon all residual liquid being removed (i.e., 
5 ^1 into 500 is lOOX washing; 50 /il into 500 is only lOX) . 

12. Remove the tube from the magnet and resuspend the beads 
in 750 jil of Wash Buffer and return to the magnet. Again let 

25 the beads pellet by waiting 1 min. 

13 . Remove the wash solution as in step 11 and repeat this 
process 3 more times . 

14 . After the removal of the fourth wash, resuspend the beads 
and transfer them to a fresh, labeled tube and wash once more. 

30 15. To elute bound phage add 400 fxl of Elution Buffer, 
titrate and rock for 14 min at RT. 

16. Place the tube on the magnet for one minute euid transfer 
the eluate to a sterile 1.5 ml tube which contains 75 /xl of 
1 M Tris pH 9.1. Vortex briefly. 
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Aanplif ication of Round 1 Eluted Phag : 

17. Plate all of the eluted round 1 phage by adding 157 /il of 
phage to 200 ^1 of overnight cells (previously checked to be 
•free of contamination) in three aliquots. Incubate 25 min in 
5 a 37*C water bath and then spread onto LB agar/antibiotics 
plate containing 2% glucose. Place plates upright 'in 37*C 
incubator until dry and then invert and incubate overnight. 
16. Scrape plates with 5 ^1 ot 2XYT/Antibiotics/Glucose and 
leave swirling for 30 min at RT. 
10 19. Add the appropriate amount of 2XYT/Antibiotics/Gluco6e 
to bring the O.D. €00 down to 0*4 and then grow at 37«'C at 250 
rpm until the 0,D. 600 reaches 0.8. 

20. Remove 5 ml and add to it 1.25 x 10^^ M13 helper phage. 

21. Shake 30 min at 150 rpm and then 30 min at 250 rpm at 
15 37*»C. 

22. Centrifuge 10 min at 3000 X g at RT, 

23. Resuspend cells in 5 ^1 2XyT with no glucose. (This step 
removes glucose) . 

24. Centrifuge as in step 22 and resuspend in 5 ml 2XYT with 
20 ):anamycin and the appropriate antibiotics (no glucose) . Shake 

18 hr at 370C and 250 rpm. 

25. Pellet cells at 10,000 X g and sterile filter the phage 
containing supernatant which is now ready for round 2 
screening . 

25 

Binding; Specific Target/Phage ComplexeB Round 2, 3, & 4: 
6a. Combine 1 fig of biotinylated target protein with 100 /xl 
of the previous round's phage (>10* pfu) in 400 ^1 of Binding 
Buffer and rock overnight at 4®C. 

30 7a. Remove unbound protein with a Microcon 100. Spin at 

800 X g until exclusion volume is met and wash twice with Wash 
Buffer (again at 800 X g) . Collect phage/protein with a 
Pipetman and add an addition 50 fil of Wash Buffer to the 
Microcon, gently titrate and combine with the first fraction 

35 to ensure maximal recovery. 
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8a. Prewash 20 (per reaction) of streptavidin magnetic 
beads (M2 8 0 streptavidin Dynabeads) twice with 500 ^1 of 
Washing Buffer using the Dynal magnet. 

9a. Add the prewashed Dynabeads to the protein/phage fraction 
5 and rock for 30 min. Add Binding Buffer to a total of 500 fil . 
Ensure that the beads mix thoroughly with the phage/protein 
solution . 

Washing and Elution: 

10 10a. Place the binding reaction into the Dynal magnet and let 
sit for 1 min. 

lla. Remove the solution and discard using a PIOOO Pipetman. 
Let the beads stand 30 sec to allow residual Binding Buffer to 
collect and remove with a P200 Pipetman. 
15 12a. Remove the tube from the magnet and resuspend the beads 
in 750 Ml of wash Buffer and return to the magnet. Again let 
the beads pellet by waiting i min. 

13a. Remove the wash solution as in Step lla and repeat this 
process 3 more times. 
20 14a. Aft^r the removal of the fourth wash resuspend the beads 
and transfer them to a fresh, labeled tube and wash 4 more 
times . 

15a. Elute and neutralize as in Step 15. 

25 Amplification of Rounds 2. 3, & 4 Eluted Phage: 

16a. Plate 10 ^1 and 100 ^ of round 2,3.4 eluates and amplify 
as in Steps 17-25. 



6.5. AFFINITY MEASUREMENTS OF 
30 PEPTIDE-TARggT PPnff^f} ^ IMTRBAPTTAM p 

Once peptides that bind to a target protein have been 

Identified, the affinities of these peptides to their 

respective targets are measured by measuring the dissociation 

constants (K,) of each of these peptides to their respective 

35 targets. Oligonucleotides that encode the peptides are 

constructed so as to encode also an epitope tag fused to the 

peptide (for example, the tnyc epitope) that can be detected by 



- 121 - 



W 96/30849 



PCT/US96/04229 



a commercially available antibody. These oligonucleotides are 
incubated with polysome extracts to produce the peptide tagged 
with the epitope. Binding of the target protein to the 
peptide is done in solution, and separation of the bound 
5 peptide from the unbound peptide is done by immunoaf f inity 
purification using an anti- target protein antibody. This 
immunoaf f inity purification is done by a modified ELISA 
(enzyme- linked immunosorbent assay) protocol, in which the 
target protein-peptide mixture is exposed to the anti-target 

10 protein antibody immobilized on a solid support such as a 
nitrocellulose membrane, and the unbound peptide is then 
washed off. In this protocol, the concentration of the target 
protein is varied and then the amount of bound peptide is 
estimated by detecting the epitope tag on the peptide by use 

15 of anti-epitope antibody. In this manner, the affinity of 
each peptide for its target protein can be determined. 

6.6, REDOR MEASUREMENTS ON A CX ^ C PEPTIDE RESIN 

This example demonstrates successful synthesis and 
20 cyclization of a CX^C peptide resin of greater than 95% purity 
and with a labeled glycine followed by successful REDOR 
distance measurements on the CX^C peptide resin using the 
preferred REDOR methods of this invention. The labeled 
peptide used was 

25 Cys-Asn-Thr-Leu-Lys- ("N-2-"C)Gly-Asp-Cys-Gly-mBHA resin, where 
a glycine linker attached the peptide of interest to the nBHA 
resin, (Cys-Asn-Thr-Leu-Lys-Gly-Asp-Cys-Gly = SEQ ID NO: 10) 

The peptide resin was synthesized by solid phase 
synthesis on p-MethylBenzhydrilamine (mBHA) resin using a 

30 combination of Boc and Fmoc chemistry. MethylBenzhydrilamine 
resin (Subst, 0.36 meq/g) was purchased from Advanced Chem 
Tech (Louisville, KY) . Fmoc ("N-2>-"C)Gly was prepared from 
HCl, ("N-2-"C)Gly (Isotec Inc., Miamisburg, OH) and Fmoc-OSu. 
Boc-Gly, (Trt) , Fmoc -Asp (OtBu) , Fmoc -Lys (Boc) , Fmoc-Leu, 

35 Fmoc-Thr (OtBu) , Fmoc-Asn and Boc-CYfi (Acm) were purchased from 
Bachem (Torrance, CA) . Reagent grad solvents were purchased 
from Fisher Scientific, Diisopropylcarbodiimide (DIG) , 
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Trif luoroacetic acid (TFA) and Diisopropylethylamine (DIEA) 
were purchased from Chem Impex (Wooddale, ID . Nitrogen, HF 
were purchased from Air Products (San Diego, CA) . 
The first step 43 was the synthesis of 
5 Boc-Cys (ACM) -Asn-Thr (OtBu) -Leu-Lys (Boc) -Gly-Asp (OtBu) - 

Cys (Trt) -Gly-mBHA resin. l.llg (0.40 meq) of mBHA resin were 
placed in a 150 ml reaction vessel (glass filter at the 
bottom) with Methylene Chloride (CHjClj) t"DCM"] and stirred 15 
min with a gentle bubbling of Nitrogen in order to swell the 

10 resin. The solvent was drained and the resin was neutralized 
with DIEA 5V in DCM (3X2 min) . After washes with DCM, the 
resin was coupled 60 min with Boc-Gly (0.280 g-1.6 meq-4 fold 
excess-O.lM) and DIG (0.25 ml-1.6 meq-4 fold excess-0,lM) in 
DCM. Completion of the coupling was checked with the 

15 Ninhydrin test. After washes, the resin was stirred 30 min in 
TFA 55% in DCM in order to remove the Boc protecting group. 
The resin was then neutralized with DIEA 5% in DCM and coupled 
with Fmoc-Cys (Trt) (0 . 937g-l . 6 meq-4 fold excess-O.lM) and DIC 
(0.25 ml-1.6 meq-4 fold excess-O.lM) in DCM/DMF (50/50). 

20 After washes the resin was stirred with Piperidine 20% in DMF 
(5 min and 20 min) in order to remove the Fmoc group. After 
washes, this same cycle was repeated with Fmoc-Asp (OtBu) , 
Fmoc {^*N-2-"C)Gly (2 fold excess only), Fmoc -Lys (Boc) , Fmoc- 
Leu, Fmoc-Thr(OtBu) , Fmoc-Asn and Boc-Cys (Acm) . After the 

25 last coupling, the Boc group was left on the peptide. The 
resin was washed thoroughly with DCM and dried under a 
nitrogen stream. Yield was 1.49g (Expected: -1.7g). 

The next step 44 was cyclization of the 
Boc-Cys-Asn-Thr (OtBu) -Leu-Lys (Boc) -Gly-Asp (OtBu) -Cys-Gly-mBHA 

30 resin. 600 mg of protected peptide resin were sealed in a 
polypropylene mesh packet . The bag was shaken in a mixture of 
solvent (DCM/Methanol/Water-640/2eo/47) in order to swell the 
resin. The bag was then shaken 20 min in 100 ml of a solution 
of iodine in the same mixture of solvent (0,4 mg Ij/ml solvent 

35 mixture) . This operation was performed 4 times. No 

d coloration was observed after the third time. The resin was 
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then thoroughly washed with DCM, DMF, DCK, and methanol 
successively. 

The last step 4 5 was side-chain deprotection of the 
Cys-Asn-Thr-Leu-Lys-Gly-Asp-Cys-Gly-mBHA resin. After 
5 cyclization the resin in the polypropylene bag was reacted 1.5 
hour with 100 ml of a mixture TFA/p-Cresol -Water f 95/2. 5/2. 5) . 
After washes with DCM and Methanol, the resin was dried 4 8 
hours under vacuum. Yield was 56 0 mg. 

The resulting peptide resin was analyzed for its purity 

10 and the presence of the disulfide bridge. 40 mg of resin were 
sealed in a propylene mesh packet and treated with HF at 0 C 
for 1 hour in presence of anisole (HF/Anisole: 90/10) . The 
scavenger and by-products were extracted from the resin with 
cold ethyl ether. The peptide was extracted with 10% Acetic 

15 Acid and lyophilized 36 hours. The d^ry isolated peptide was 
characterized by PDMS (mass spectrography) and HPLC (high 
performance liquid chromatography) . This analysis 
demonstrated that greater than 95% of the product peptide was 
of the correct amino acid composition, having a disulfide loop 

20 and without inter-molecular disulfide dimers. 

REDOR measurements were made on the peptide resin 
prepared by this method, and as a control, also on dried 
(^*N-2-"C) labeled glycine. The preferred REDOR methods and 
parameters, as previously detailed, were used. Fig. 6 

25 illustrates the ^^N resonance spectral signals obtained. 
Signal 70 is the signal produced by dried glycine after no 
rotor periods. Signals 71, 72, 73 are glycine signals after 
2, 4, and 8 rotor periods, respectively. Signals 74, 75, 76, 
and 77 are the peptide resin signals after 0, 2, 4, and 8 

30 rotor periods, respectively. 

Fig. 7 illustrates the data analysis. As in Fig. 5, axis 
81 is the AS/S cLxis, and axis 82 is the X axis. The variables 
are as used in equation 5. Graph 83 is defined by equation 5, 
and is the initial rising part of the full curve shown in Fig. 

35 5. Data points 84, 85, 86, and 87 are best fits of the data 
for 0, 2, 4, and 8 rotor periods, respectively. At these 
points, the circles represent the glycine values and the 
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squares the peptide resin values. These values correspond to 
a C-N distance in glycine and the peptide of 1.55 A (and a D^. 
of 800 Hz) . Repeated measurements gave a C-N distance of 
1.50 A (and a Dck of 875 Hz) . The accepted distance in glycine 
5 is 1.48 A. The above procedure was repeated for (^*N-1-"C) 
labeled glycine in 

Cys-Asn-Thr-Leu-Lys- (^^N-l-^^C) Gly-Asp-Cys-Gly-niBHA resin, and 
the measured C-N distance of 2.50 A is in excellent agreement 
with the predicted value of 2.46 A. 

10 Thus REDOR accuracy to better that 0,1 A is demonstrated. 

Also demonstrated is the peptide resin as an appropriate 
substrate for NMR measurements. Inter-molecular dipole-dipole 
interactions between adjacent peptides did not interfere. 
Also the overlap of the distances measured in free glycine and 

15 in glycine incorporated in the peptide demonstrated that the 
peptide was held sufficiently rigidly by the resin that any 
remaining peptide motions did not interfere with the NMR 
measurements , 

20 7. SPECIFIC EMBODIMENTS. CITATIO!i OF REFERENCES 

The present invention is not to be limited in scope by 
the specific embodiments described herein. Indeed, various 
modifications of the invention in addition to those described 
herein will become apparent to those skilled in the art from 

25 the foregoing description and accompanying figures. Such 
modifications are intended to fall within the scope of the 
appended claims - 

Various publications are cited herein, the disclosures of 
which are incorporated by reference in their entireties. 



35 
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8. CQMPTJTER PROGIU^ LISTINGS 

These conputer program listings are copyright 1995 of 
CuraG n, Inc. ® 1995 CuraGen, Inc. 

START OF LISTING . 

C CODE ROUTINES 
MAKEFILE AND GO PROC 

MAKEFILE : 

0PTI0NS=-mips2 -ansi -g -fullwam -OO 

peptide.ex: random. o peptide. o peptidel.o peptide2.o peptide3.o 
peptide4.o \ 

peptide5.o peptides. o peptide?. o 

cc $ (OPTIONS) random. o peptide*.© -Im -o peptide.ex 
random . o : random . c 

cc $ (OPTIONS) -c random^c 
peptide . o : peptide . c * . h 

cc $ (OPTIONS) -c peptide. c 
peptidel.o: peptidel.c ♦•h 

cc $ (OPTIONS) -c peptidel.c 
peptide2.o: peptide2.c *.h 

cc $ (OPTIONS) -c peptide2.c 
peptide3.o: peptides. c ♦.h 

cc $ (OPTIONS) -c peptides, c 
peptide4.o: p ptid 4.c *.h 

cc $ (OPTIONS) -c peptide4.c 

12 S 
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peptides. o: peptides. c *.h 

cc $ (OPTIONS) -c peptides. c 
peptides. o: peptides. c *.h 

cc $ (OPTIONS) -c peptides. c 
peptide?. o: peptide?. c *.h 

cc $ (OPTIONS) -c peptide 7. c 



GO PROC: 

peptide. ex << EOF 

0.1 

1 

CGGGGGGC 
EOF 



MAIN PROGRAM - PEPTIDE. C 



#define MAIN 

#include "peptide . h " 

/* The main program st\ib ♦/ 

void main{int argc, char *argv[] , char *envp[3) 
{ 

logical *cyclic; 

int n_peptides, max_atoms_j)er_unit ; 

int *n_amino_acids, *n_atoms_total , *n_side, *n_main; 

rigid_unit ♦♦peptide; 

torsion_list ♦♦torsion; 

hbond_list ♦♦hbond; 

atom^list ♦♦atom, ♦♦atom2; 

atom_info ♦♦atom_tmp; 

vector ♦twig[KMAXl; 

int ♦♦♦bond_table; 

string ♦sequence; 

int i, j; 

int list_num, TOcoc_atoms_total; 
double seed; 

regrowth ♦♦main, ♦♦side; 

printf ("Enter random number seed ") ; 

12? 
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scanf ( "*lf " , &seed) ; 

ran2 (seed) ; 
/* get linear sequences ♦/ 

get^seguence (^sequence, &n_peptides) ; 

printf ("\n") ; 
/♦ allocate memory for arrays ♦/ 

if {(peptide = (rigid_unit 

malloc (n_peptides*si2eof (rigid_unit ♦))) == NULL) 

out_of ^memory ( ) ; 
if ((torsion = (torsion_list **) 

malloc(njpeptides*si2eof (torsion_list ♦))) == NULL) 

out_of ^memory ( ) ; 
if ( (hbond - (hbond_list ♦*) malloc (n_peptides*sizeof (hbond_list 

*) ) )===NULL) 

out_of ^memory ( ) ; 
if ((atom = (atom_list **) malloc (n_j>eptides*si2eof (atom_list 

*))) == NULL) 

out_of ^memory ( ) ; 
if ( (atom2 = {atom_list ♦♦) malloc (n_peptides*si2eof (atom_list 
♦) ) ) == NULL) 

out_of ^memory { ) ; 
if ( (atom_tmp = (atom_inf o **) malloc (n_j)eptides*sizeof (atom^inf o 

*))) 

NULL) out_of _memory ( ) ; 
if ((main = (regrowth ♦*) malloc (n_peptides*sizeof (regrowth *))) 
NULL) 

out_of_memory 0 ; 
if ((side = (regrowth ♦*) malloc (n_peptides*si2eof (regrowth *))) 
NULL) 
out_of jmemory ( ) ; 
if ( (bond_table = (int ♦**) malloc (n_j)eptides*si2eof (int **))) 
«= NULL) 

out_of _memory ( ) ; 
if ( (n_amino_acids = (int *) malloc (n_peptides*si2eof (int) ) ) == 

NULL) 

out_of ^memory ( ) ; 
if ( (n_atoms_total = (int *) malloc (nj)eptideE*sizeof (int) ) ) 
NULL) 

out_of ^memory ( ) ; 
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if ((cyclic = (logical *) malloc (n_pept:ides*si2eof (logical) ) ) == 
NULL) 

out_of_m mory ( ) ; 
if { (n^main = (int ♦) malloc (n_peptides*si2eof (int ) ) ) NULL) 

out_of ^memory 0 ; 
if ( (n_side - (int ♦) malloc {n_peptides*sizeof (int) ) ) == NULL) 

out_of ^memory ( ) ; 
for(i=0; i<n_peptides; i++) { 

n_aTnino_acids [i] = (int) strlen (sequence [i] ) ; 

} 

/* read in parameter files ^/ 
read_torsion_data { ) ; 
read_lj_data() ; 
read_hbond_data { ) ; 
max_atoms_per_iinit = 0; 
/* read in geometric sequence information ♦/ 
max_atoms_total = 0; 
for (i=0; i<n_peptides; i++) { 

peptide (i) = read_peptide_data (sequence I i] , &n_atoms_total [i] , 

&max_atomsjper_unit) 
cyclic[i] = (n_amino_acids [i] > 1) Sck {sequence [i] [0] 'C') 

EcEl 

(sequence [i] [n_amino_acids [il -1] ==' C ) ; 
if (cyclic[i)) peptide[i) - modify_cystine_ends (peptide I i] , 

n_amino_acids [i] , 
fi£n_atoms_total li] ) ; 
if (n_atoms_total [il >max_atoms_total) max_atoms_total = 
n_atoms_total [i] ; 

n_main[i] = (cyclic [il) ? 2*n_amino_acids [il + 3 : 
2*n_amino_acids [il + 1; 

n_side[i] = n_amino_acids [il ; 

} 

/* allocate sub arrays */ 
for (i«0; i<KMAX; i++) 

if ((twig[il e (vector *) 
malloc (max_atoms_total*si2eof (vector) ) ) 
NULL) out_of ^memory 0 ; 
for(i«0; i<n_peptides ; i++) { 

if ((atom[i3 = (atom^list *) 
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malloc (n_atoms_total ti) *si2eof (atom_list) ) ) 
== NULL) out_of _meroory ( ) ; 

if ((atom2[i] = (atom_list *) 
malloc (n_atoms_total [i] *Bizeof (atom_list) ) ) 
Bs NULL) out_of _memory ( ) ; 

if ((atoin_tmpli] = (atoin_info *) 
malloc {n_atoms_total li] ♦sizeof (atom_inf o) ) ) 
NULL) out_of _memory ( ) ; 
if ((main[i] = (regrowth ♦) 

malloc (n_main[i)*sizeof{regrowth))) NULL) 

out_of _memory { ) ; 
if ((sideli) = (regrowth ♦) 

malloc (n_side[il*sizeof (regrowth) ) ) == NULL) 

out_of _memory ( ) ; 
if ((bond_table[i] = (int **) 

malloc (n_atoms_total [il *sizeof (int *) ) ) 
== NULL) out_of_memory 0 ; 
for (j=0; j«:n_atoms_total [i] ; j++) 

if ( ( b ond_t ab 1 e [ i ] t j 1 = (int ♦) 
malloc (MAX_BONDS*sizeof (int) ) ) 

== NULL) out_of _memory ( ) ; 

} 

/* loop over all peptides ♦/ 
for (i=0; i<n_peptides; i++) { 

get_main_side (peptide [i] , mainii] , side[i]. tn_main[i], 

&n_side [i] ) ; 

/♦ determine connections */ 

initialize_connection_table(bond_tableli] , n_atomB_total [i] ) ; 

list_num = 0; 

make_connection_table (bond__table [i] , &list_num, peptide [i] , 
peptide [i] ) ; 

/*print_connection_table (bond_table [i] , n_atoms_total [i] ) ; */ 
list_num = 0; 

/* assign noncoordinate information in atom array ♦/ 

assign_atomj>ointers (&list_num. peptide [il . peptide [i] . 

atom[i] ) ; 

/♦ g t H-bonds and torsion lists ♦/ 

get_hbonds (&hbond[i] , atom I i] , n_atoms_total [i]) 
/ *pr int_hbonds ( hbond [ i ] , at om t i 1) ; * / 
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list^num = 0; 
torsion [i] = NULL; 

get_torsions (&torsionfi] , bond_table [i] , &list_nuTn, atom[i] 
p ptide fij , 

peptide [i] ) ; 
assign_lj_parameters (peptide [i] , peptide [i] ) ; 
/* copy noncoordinate information in atom to atom2 ♦/ 

for {j=0; j<n_atoms_total [i] ; j+-f) atom2[i][j] = atom[i][jl. 

) 

do the Monte Carlo */ 
do_mc (peptide [0] , tor6ion[0], hbondfO], atom[0] , atom2[0) 
atom_tmp [0] , 

twig, main[0], side[0], n_amino_acids [0] , 
n_atoms_total [0] , n__main [0] , n_side [0] , cyclic [0] ) ; 
/*print_torsions (torsion 10) , atom[0]) ;*/ 

write_car_f ile (n_amino_acids [0] , n_atoms_total 10] , atom[OJ 
"test .car" ) ; 

} 

tundef MAIN 

♦ **♦★♦♦****♦♦♦♦♦♦♦****■**♦♦*♦***♦*♦♦♦♦♦*•♦♦*♦♦♦•♦♦♦♦♦♦*♦♦♦♦♦*•♦*♦*♦* 

INPUT/OUTPUT ROUTINES - PEPTIDEl.C 

/* input/output routines ♦/ 
tinclude "peptide. h" 

/* hardcoded AMBER rules have the keyword AMBER nearby 
*/ 

#define NT_CT_DISTANCE 1.4750 
#define S_S_DISTAHCE 2.0380 
#define P_C3JARGB 0.04B 
#define C_CHARGE1 -0.098 
tdefine C_CHARGE2 0.050 
#de£ine C_CHAR6E3 0.050 
#define C_CHARGB4 0.824 
#define C_CHARGE5 -0.405 
#define C_C3IARGE6 -0.405 

J* This function is called when out of memory 
♦/ 
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void out^of^memory (void) 

{ 

printf("Out of memory error\n"); 
exit (1) ; 

} 

/♦ This routine returns the 1- letter amino acide sequences 
V 

void get_seguence (string **6equence, int *n_peptides) 
{ 

#define SEQUENCE_LENGTH 80 
int i; 

printf ("Enter number of peptides: "); 
scanf("%d", n_peptides) ; 

if ((♦sequence - (string ♦) malloc (♦n^peptides^sizeof (string) ) > 
== NULL) 

out_of ^memory ( ) ; 
for (i*0; i<*n_peptides; i++) 

if (({*sequence)[i] = (string) 
malloc (SEQUENCE_LENGTH*sizeof (char) ) ) 
NULL) out_of_memory ( ) ; 
for (i=0; i<*n_peptides; i++) { 

printf ("Enter peptide sequence %d: ",i); 
scanf ("%s", (*sequence) [i] ) ; 

} 

#undef SEQUENCE_LENGTH 
} 

/* read in the data files associated with this sequence 
*/ 

rigid_unit ♦read^jpeptide^data (string sequence, int *n_atoms_total, 

int ♦mcOC_atoms_per_unit) 

{ 

int i, n_amino_acids; 
char name []="?. dat" ; 
acid__label label; 
rigid_unit ♦ul, *u2, *ret; 

/* check amino acids in sequence ♦/ 
n_amino_acids = strlen (sequence) ; 
for(i=0; i<n amino_acids; i++) { 
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label = aTnino_acid_cbde {sequence [i] ) ; 
if (label == BAD) { 

printf ("Invalid amino acid code %c\n", sequence [i] ) ; 

exit (1) ; 

} 

if (label == P) { 

printf ("Proline not yet supported\n") ; 
exit (1) ; 

} 

} 

♦n_atoms_total = 0; 
/* add unit A */ 

label = amino_acid_code (secjuence [0] ) ; 

ul = read_unit ("unitA.dat", label, 0, n_atoms_total, 
max_atomsjper_unit) ; 
ret = ul^- 
forfi^O; i<n_amino_acids; i++) { 

name [0] sequence [i] ; 

label = atnino^acid^code (sequence (i) ) ; 
/♦ add unit B */ 

u2 = read_unit ("unitB.dat" , label, i, n_atoms_total, 
max_atoms_per_unit) ; 

u2->type = nonCunit; 
/* follow lUPAC naming rules if glycine 

if (label == G) strcpy (u2->atom[l] .name, "HAl"); 
/* follow AMBER charge rules if alanine or proline */ 

if (label == A j j label == P) u2->atomtl] .charge = P_CHARGE; 

if {i==0) u2->head.axis = vector_scale (u2->head.axis, 
NT_CT_DI STANCE) ; 

couple_unit (ul,u2) ; 

ul = u2; 
/* add residue */ 

u2 = read_unit (name, label, i, n_atoms_total , 
max_atoms_per_unit) ; 

couple_unit (ul , u2 ) ; 
/* add unit C or D */ 

u2 = read_unit { (iB=n_amino_acids-l) ? "unitD.dat" : 
"xinitC.dat", 

label, i, n_atoms_total, max_atoms_per_\anit) ; 
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if (i < n_atnino_acids-l) { 
/♦ align incoming and outgoing bonds */ 

u2->bond[0] ->tail .axis - vector_scale {u2->head.axis, i.O); 
u2->type = Ciinit; 

label = amino_acid_code (sequence ti+1] ) ; 

u2->atOTn [2] .residue = u2->atom [3] .residue = label; 

u2->atom[2] .residue_num = u2->atom(3] . residue^num = i+1; 

} 

couple_unit (ul , u2 ) ; 
ul = u2; 

} 

return (ret) ; 

} 

/* This routine reads in a rigid unit data file 
V 

rigid_unit *read_unit (string file, acid^label label, int 
residue_num, 

int *n_atoTns_total, int ♦max_atoms_per_unit) 

{ 

#define LINE_LEN 200 
FILE ♦fp; 

int i, j, k, il, n_rigid_units; 

char stxnpl [NAME_LENGTH) , stTnp2 [NAME_LENGTH] , line [LINE_LEN] ; 
rigid_unit ♦♦utnp; 

if ((fp = fopen(file, "r")) == NULL) { 

printfC'Data file %s does not exist\n", f ile) ; 
exit (1) ; 

} 

/♦ read in nutnber of rigid units ♦/ 
getlineCline, LINB_LEN, fp) ; 
sscanfdine, "Ird", &n_rigid_units) ; 
/♦ printf ("%d\n",n_rigid_units) ; */ 
if { (utttqp s (rigid_unit 

inalloc(n_rigid_units*si2eof (rigid_unit *))) == NULL) 
out_of ^memory ( ) ; 
/* allocate rigid xinit */ 

for (i«0; i<n_rigid_\inits ; i++) { 
if ((utii?)[i] = (rigid_unit ♦) 

mallocCsizeof (rigid_unit))) «= NULL) out^of^raemoryO ; 



134 



wo 96/30849 PCTAJS96/(M229 

utmpti] ->type = UNKNOWN; 
get line (line, LINE_LEN, fp) ; 
sscanf (line, "%d" , &utmp [i] ->n_atOTn5) ; 
♦n_atoms_total utmp [i] •>n_atoms; 
if (utmp[i] ->n_atoms > *max_atomsj)er_unit) 
*max_atoms_per_unit * utmp[i] ->n_atoms; 
/* printf {"%d\n",utnp[i) ->n_atoms) ; ♦/ 
if ( (utmp tiJ ->atoTn = (atom^info *) 

malloc (uttrp ti] ->n_atOTns*si2eof (atom_info) ) ) NULL) 
out_of _memory ( ) ; 
/* read in atoms ♦/ 

for{j»0; jcutmpfi) ->n_atoms; j++) { 
get line (line, LINE_LEN, fp) ; 

sscanf (line, "%s Uf %lf *lf %s %d %s %s %lf", 
utmp [i] ->atoin[ j ] .name, 
&utTt$>[i] ->atom[ j] . position. x, 
&utmp [i] ->atom[ j] .position.y, 
tutmp [i] ->atom[j] .position. z, 
ficsttrpl, &il, 

utmp [i] ->atom[j ] - type, &stmp2, 

tutTt5>[i) ->atom[j] .charge) ; 
h printf ("%s %lf %lf %lf %s %lf\n", 

utmp[i] ->atom[j] .name, 

utmpfi] ->atom(j] . position. x, 

utmplij ->atom[j] .position./, 

utnp[ij ->atom[j] .position. z, 

utmpti] ->atom[j] .type, 

utmp[i] ->atomIj] .charge) ; ♦/ 
utmp li] ->atom[j ] .residue = label; 
utmpli] ->atom[j] .residue_num = residue_num; 

} 

} 

for (i=0; i<n_rigid_units; i++) { 
/* allocate incoming bond vector information */ 
getline (line, LINE^LEN, fp) ; 

sscanfdine, "*d %d Vd %d *d", &il, tutmp [i] ->head.bond[0) , 
&utmp[il •>head.bondll3 , &utmp[il ->head.bondl2] , 
&utmplil ->head.bondl3] ) ; 
/♦ printf (»%d %d %d %d %d\n",il, utmp[i]->h ad.bondlO], 
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utnpti] ->head.bond[lJ , utmp[i] ->head.bond[2] , 
utmp ti] ->head.bond[3] ) ; */ 

for {j.4; j<MAX_BONDS; j++) Utmp [ i] - >head • bond [j ] = -l; 

utmp [i] ->head. atom^nuTn = il; 

getline(line,LINE_LEN,fp) ; 

sscanfdine, "%lf %lf %lf", &utTnp[i] .>head.axis-X, 

&utTnp[i] ->head.axis .y, 
fitutmpti) ->head.axis.2) ; 
/♦ printf("%lf %lf %lf \n'», utmp [i] ->head, axis. X, 

utmp [i] ->head.axis .y, 
utmpli] ->head.axis.2) ; ♦/ 

utmpti] ->head.axis.x=utmp[i] ->atom[il] . position. x-utmp [i] ->head. 
axis.x; 

utmp[i] ->head.axis.y=:utmp[i] ->atom[il] . position. y-utmp [i] ->head. 
axis .y; 

utmpli] ->head. axis . 2=utmp [i] ->atom[il] .position, z-utmp [i] ->head. 
axis . z ; 

/* allocate outgoing bond pointers ♦/ 
getline (line, LINE_LEN, fp) ; 
sscanf (line, "%d", &utnp [i) ->n_bonds) ; 
if ( (utmp [i] ->bond = (bond_type *♦) 

malloc (utmp [il ->n_bonds*sizeof (bond_type *) ) ) NULL) 
out_of_memory ( ) ; 
for (j=0; j<utmp [i] ->n_bonds; j++) { 
if ( (utmpli] ->bondIj] = {bond_type *) 
malloc (sizeof (bond_type) ) ) == NULL) 
out_of _memory ( ) ; 
getline (line,LINE_LEN, fp) ; 
sscanf (line, "%d" , &ii) ; 
/* printf {»lrd\n",il) ; ♦/ 

utmpli] ->bond[j] ->next = (il-=-l} ? NULL : utmp[ilj; 
getline (line, LINE_LEN, f p) ; 
Escanfdine, "%d %d %d %d %d", &il, 
&utmp[i] ->bond[jl ->tail .bond [0] , 

tutmpEi] ->bond[j] ->tail .bondll] , 
&uttnp[i] ->bond[j] ->tail .bond [2] , 
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fiutirpCi] ->bond[j] ->tail .bond [3] ) ; 
/* print f("%d Vd %d %d »d\n« . ii, 

utmpfi] ->bond[j] ->tail .bond [0] , 

utn5)ti] ->bond[j] ->tail .bond [IJ , 
utmp[i] ->bond[j] ->tail .bond [2] , 
utmpti] ->bond[j] ->tail.bond[3] ) ;♦/ 
for ()c=4; k<MAX_BONDS; k++) utmp [ij ->bond [ j ) ->tail . bond [k] 

» -1; 

utmp [i] - >bond f j ] ->tail . atoin_nutn= ii ; 
getline (line , LINE__LEN, fp) ; 

sscanfdine, nif %lf %lf.., tutmp [i] ->bond [ j ] ->tail .axis .x, 

fitutmp[i] - >bond[j] - >tail.axis.y, 

Stutmpli] - >bond[j] - >tail.axis.2) ; 

u t m p ( i ] - > b o n d [ j ] - > t a i 1 . a X i s . x 
utmp(i) ->atom{ii) . position. x; 

utmp[i]->bond[j]->tail.axis.y 
utmp {i] ->atom[ii] . posit ion. y; 

utmp[iJ->bond(j]->tail.axis.2 
utmp [i] ->atom[iiJ .position. z; 

utmp[i) ->bond[j] ->tail .axis = 

^ ■^ector_scale(utmp[i]->bond[j] ->tail. axis, 1.0) 

} 

f close (fp) ; 

return (utmp 10] ) ; 
#undef LINE_LEN 
} 

/♦ This routine couples two rigid units 
*/ 

void couple_unit (rigid_unit *unitl, rigid_unit ♦unit2) 
bond_type **bond; 

for(bond=unitl->bond; bondfO) ->next; bond++) 
bondtO] ->next » unit2; 

} 

/* This routine turns a linear CX_nC peptide into a cyclic 
disulf ide-bonded peptide 

*/ 

rigid_unit *modify_cystine_ends (rigid_unit ♦unit, int 
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n_amino_acids , 

int *n_atoms_total) 

{ 

int i; 

rigid_unit *unitl, *iinit2, *unit3, *unit4, *unit5, *unit6; 
double len; 
vector headl, head2; 
bond_type ♦btmp; 
/* get new first \init */ 

unitl = unit->bond[Ol ">next; 
unit2 = unitl->bond (0] ->next ; 
units = unit2->bond [0] ->next ; 
/* save head vectors ♦/ 
headl = unit l->head. axis; 
head2 = unit2->head .cixis; 
/* modify A unit to be a side group */ 
len = vector_length (unitl- >head. axis) ; 
unit->head ~ unit->bond [0] ->tail ; 
unit->head.aucis .x *= -len; 
unit->head.axis,y *= -len; 
unit->head.axis.2 *= -len; 
unit->n_bondB = 0; 
/* modify C_alpha head */ 

len = vector_length{unit2->head.axis) ; 

unitl->head = unitl - >bond [0] ->tail ; 

unitl->head.axis ,x *= *len; 

unitl->head.axis .y -len; 

unitl->head.axis.2 ♦= -len; 
/♦ modify C_beta head ♦/ 

len = vector_length(unit3->head.axis) ; 

unit2->head = unit2->bond[0] ->tail ; 

unit2->head»axis ,x *= -len; 

unit2->head.axis.y *= -len; 

\init2->head^axis, z *= -len; 
/♦ modify S tail */ 

unit3->bond = unit->bond; 

units ->head- bond [2] = -1; 

units - >bond [ 0] ->tail = unit3->head; 

units - >bond [0] - >tail . axis « vector_scale (\mit3 - >head . axis , 
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-1.0) ; 

unit3->bond[01 ->next = unit2 ; 

unit3->n_bonds = 1; 

unit3">n_atOTns-- ; 
( ♦n_atoms_total ) - - ; 
/* modify S head ♦/ 

units ->head. axis « unit3->atoin[0] .position; 

unit3->head.axis.x -= unit3->atom[3] . position, x; 

unit3->head.axis.y -= unit3->atom [3] .position, y ; 

unit3->head.axis.2 -= unit3->atom [3] .position. 2 ; 
/♦ modify C_beta tail ♦/ 

unit2->bond [0] ->tail .axis = vector_scale (head2 , -1.0); 
unit2->bond[0] ->next - unitl; 
/♦ modify C_alpha tail ♦/ 

unitl->bond[0) ->tail .axis = vector_scale (headl , -1,0); 
unitl- >bond [0] ->next = unit; 
unit4 = unitl; 
/♦ find last B unit */ 

for {i=l; i<n_amino_acids; i++) { 

unit4 = unit4->bond [unit4->n_bonds-l] ->next ; 
unit4 = unit4->bond[unit4'>n_bonds-l] ->next; 

) 

units = unit4->bond [0] ->next ; 
unit6 = units - >bond [0] ->next; 

swap bond 0 and bondl for unit 4*/ 
btmp = unit4->bond [0] ; 
unit4->bond[0] = unit4->bond [1] ; 
unit4->bondtlJ = btmp; 
/* modify S tail ♦/ 

if ({unit6.>bond = (bond_type ♦♦) malloc (sizeof (bond_type ♦))) 
NULL) 

out_of ^memory ( ) ; 

if ( (unit6->bond[0] = (bond^type *) malloc (sizeof (bond^type) ) ) 
== NULL) 

out_of _memory ( ) ; 

units ->head. bond [2] « -i; 

unit6->bond[0] ->tail =: unit6->h ad; 

unit6->bond[0) ->next = unit3; 

unite ->n_bonds = 1; 
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unit6->n_atoTns-- ; 
( ♦n_atoms_total ) - - ; 

unit6->bond [0] ->tail .axis = unit6->atom[3] . position; 

unit6->bond [0] •>tail .axis .X -= unit6->atoTn[0] .position . x; 

unite - >bond [0] ->tail .socis .y -= unit6->atOTn[0] . position. y; 

unit6->bond [0] ->tail .axis . 2 -= units- >atom[0] . position. z; 

unit6->bond[0) - >tail.axis = 
vector_Bcale (units - >bond tO] ->tail •axis, 1.0) ; 
/* use AMBER S-S bond length */ 

unit3->head.axis = vector_scale (unit3->head.axis, S_S_D I STANCE ) ; 
/* modify cystine S types to obey AMBER rules */ 

strcpy (units ->atom[0] .type, "S") ; 

strcpy (units ->a torn [0] .type, "S") ; 
/♦ modify cystine charges to obey AMBER rules */ 
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return (units) ; 

} 

/* This routine determines the main and side unit pointers 
*/ 

void get_main_side (rigid_unit *unit, regrowth *main, regrowth 
♦side, 

int *n_main, int ♦n_side) 

{ 

rigid_unit *start, *unit2, •lastmain; 
regrowth *mainO; 
int i; 

mainO = main; 
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*n_side = 0; 
♦n_main = 0; 
start = unit; 
lastmain » NULL; 
do { 

main->unit = unit; 
main->prev = lastmain; 
main+4-; 
C*n_Tnain) ++; 

for (isO; i<unit->n_bonds-l; i++) { 
unit2 = unit->bond[i] ->next; 
if {unit2->atom[0] , residue !=G) { 
side->unit = unit2; 
side->prev = unit; 
side++ ; 
(*n_side) ++; 

} 

} 

lastmain = unit; 

unit = unit->bond[i] ->next; 
} while (start != unit unit->n_bonds > 0); 
if (unit->n_bonds s= o) { 

main->unit = unit; 

main->prev = lastmain; 

main+-f; 

(*n_main) ++; 
} else { 

mainO->prev = lastmain; 

} 

} 

/* This routine reads in the torsion data file 
V 

void read_torsion_data (void) 
{ 

#define LINE_LEN 200 
FILE ^fp; 

char line tLINE_LBN] / 
int n^torsions, itrr^), i; 
double ftnp; 
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torsion_data ♦*data; 

if {(fp = fopen( "torsion.dat", "r")) === NULL) { 
print f ("Data file torsion.dat does not exist\n»') ; 
exit (1) ; 

} 

getline (line, LINE_LBN, fp) ; 

sscanf (line, "%d", &n_torsions) ; 

if ( {torsion_data_liBt = (torsion_data *♦) 

malloc ( (n_torsions+l) ♦sizeof (torsion^data ♦) ) ) == NULL) 
o\it_of_memory () ; 

data = torsion_data_list ; 

data (n_t or s ions] - NULL; 

for (i=0; i<n_torsions; i++) { 

if ((data[i] = ( torsion_data ♦) malloc (sizeof (torsion^data) ) ) 
NULL) 

out_of ^memory ( ) ; 
getlinedine, LrNE_LBN, fp) ; 

sscanf (line, "%lf %d %s %s %s %s %lf %lf %lf %lf %lf %lf\ 
tftmp, &itmp, data [ij ->typel, 

data[il ->type2, data [i] ->type3 , data[i] ->type4, 

&data [i] ">vO [0] , &data [i] ->phiO [0] , 

&data li] ->vO [1] , &data [i] ->phiO [1] , 

&data[i] •>v0[2l , &data[i] ->phiO [2] ) ; 
datafi] ->phiO [0] ♦= PI/180.0; 
data[i]->phiQ[l] PI/180.0; 
data[i] ->phi0 [2] *= PI/180.0; 

} 

f close (fp) ; 
#\mdef LINE_LBN 
) 

/* This routine reads in the Lennard- Jones data file 
♦/ 

void read_l j_data (void) 
{ 

#define LINE_LBN 200 
FILE *fp; 

char lineILINE_LEN] ; 
int n_terTns, itn^), i; 
double ttmp; 
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lj_data ♦♦data; 

if ((fp = fopen("lj_j)aram.dat" , "r")) NULL) { 
printf{"Data file lj_paraTn.dat does not exist\n") ; 
exit (1) ; 

} 

getlinedine, LINE_LEN, fp) ; 
sscanfdine, "Vd", &n_terms) ; 
if ((lj_data_list ^ (Ij^data ♦* 
malice ( (n_terms+l)*Bi2eof (lj_data ♦) ) ) 
== NULL) out_of_ineinory 0 ; 
data = 1 j_data_list ; 
data [n_terms] = NULL; 
for (i=0; i<n_terms; i++) { 

if ((data(i] = (lj_data ♦) malloc (sizeof (lj_data) ) ) == NULL) 

out_of ^memory ( ) ; 
getline (line, LINE_LEN, fp) ; 

sscanfdine, ''%lf %d %s %lf %lf, &ftnip, ficitmp, data [i] ->type 
&data li] ->ri, tdata [i] ->ei) ; 

} 

f close (fp) ; 
#undef LINE_LEN 
} 

/♦ This routine reads in the H-bond data file 
V 

void read_hbond_data (void) 

{ 

#define LINE_LEN 200 
FILE *fp; 

char linetLINE_LEN] ; 
int n__terms, itmp, i; 
double ftnp; 
hbond_data **data; 

if ((fp = f open (•• hbond.dat", "r")) =* NULL) { 
printf ("Data file hbond.dat does not exist\n") ; 
exit(l) ; 

} 

getlinedine. LINE_LEN, fp) ; 

sscanfdine, "%d", &n_terms) ; 

if ( (hbond_data_list = (hbond_data **) 
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malloc ( (n_tenns+l) *sizeof {hbond_data *))) == NULL) 
out_of ^memory ( ) ; 

data = hbond_data_list ; 

data [n_terms] = NULL; 

for (i=0; i<n_terTns; + ) { 

if ((datati] = (hbond_data *) malloc (sizeof (hbond_data) ) ) == 
NULL) 

out_of ^memory ( ) ; 
getline (line, LINE_LEN, fp) ; 
sscanf (line, "*lf *d \s %s %lf %lf", 
tftmp, tittup, data[il ->typel, 
data [i] ->type2, &data [i] ->a, ficdata [i] ->b) ; 

} 

f close (fp) ; 
#undef LINE_LEN 

} 

/♦ write out the BIOSYM car files associated with this sequence 
V 

void write_car_f ile (int n_amino_acids, int n_atOTns_total , atom_list 
*atom, 

string file) 

{ 

int i; 

char name [NAME_LENGTH1 ; 
FILE *£p; 
tiTne_t t," 

if ( (fp = fopen(file, "w")) == NULL) { 

printf ("Cannot open car file %s\n" , f ile) ; 
exit (1) ; 

) 

fprintf(fp, "IBIOSYM archive 3\n"); 
fprintf(fp, "PBC=OFF\n\n") ; 
t = time (NULL) ; 

fprint£{fp, ••!DATE%s", ctime(&t)); 
for (isO; i<n__atoms__total ; i++) { 

amino_acid_code_3 (atom[i] .p->reBidue, name) ; 

capitaliz (name) ; 

if (atom[i] .p->residue_num «= n_amino_acids-l) 
strcat (name , "N" ) ; 
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else if (atom[i] .p->residue_nuTn 0) 

St r cat (name, "n") ; 
else if (atoni[i] .p->residue »= C) 

s treat (name, "H") ; 
fprintf (fp, "%-5s%15 . 9f %15 . 9f %15 . 9f %-4s %-3d %-2s 
%2c%8,3f\n", 

atom[i] .p->name, 

atomli) . position. X, atomli] .position.y, 

atom[i) . position. z, name, atom[i3 .p->residue_num+l, 
acomtil .p->type, 

atom[i] .p->type[0] , atom[i] .p->charge) / 

} 

fprintf (fp, "end\nend\n") ; 
f close (fp) ; 

) 

/* this routine returns the next valid line from the file 
*/ 

string getline (string line, int len, FILE ♦fp) 

{ 

string ret; 
do { 

retsfgets (line, len, fp) ; 

strip (line) ; 
} while (ret != NULL *line= = ' \xO' ) ; 
return (ret) ; 

} 

/♦ strip CR and LF from the end of a string 
also ignore everything to the right of ! 

V 

void strip (string string) 
{ 

for (; ♦string != '\xO' ♦string 1= '\xA' ♦string != '\xD' 
SlSl ♦string l» ' ! ' ; string++) 

♦string = ' \xO' ; 

} 

/* remove commas from string, replacing with spac s 
*/ 

void decomma (string string) 
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{ 

for (; ♦string != '\0'; string++) 
if (*string == ',') *string s ' ' 

} 

/* This function capitalizes a string 
♦/ 

void capitalize (string e) 

{ 

int o; 

o = 'a' - 'A' ; 

for (; *S; s++) if (*s >= 'a' && *s <= 'z') *s -= O; 

} 

/* This function returns the 3 -letter code for the amino acid 
*/ 

void ainino_acid_code_3 (acid_label label, string code_3) 
{ 



switch (label) { 



case 


G 


: strcpy (code_3 , 


"Gly") , 


' breaks- 


case 


A 


: strcpy {cocie_3, 


"Ala") , 


break; 


case 


V 


: strcpy (code_3, 


"Val") ; 


break; 


case 


L 


: strcpy (code_3 , 


"Leu") ; 


break; 


case 


I 


: strcpy (code_3, 


"He") ; 


break; 


case 


S 


: strcpy (code_3, 


"Ser") ; 


break; 


case 


T 


: strcpy (code_3 , 


"Thr") ; 


break; 


case 


D 


: strcpy (code_3, 


"Asp") ; 


break; 


case 


E 


: strcpy (code_3. 


"Glu") / 


break ; 


case 


N 


: strcpy (code_3, 


"Asn") ; 


break; 


case 


Q: 


strcpy (code_3 , 


"Gin") ; 


break; 


case 


K: 


strcpy (code_3. 


"Lys") ; 


break; 


case 


Hi 


strcpy ( code_3 , 


"His") ; 


break; 


case 


R: 


strcpy (code_3, 


"Arg") ; 


break; 


case 


F: 


strcpy (code_3, 


"Phe") ; 


break; 


case 


Y: 


strcpy { code_3 , 


"Tyr") ; 


break; 


case 


Wt 


strcpy ( code_3 , 


"Trp") ; 


break ; 


case 


C: 


strcpy ( code_3 , 


"cys") ; 


break; 


case 


M: 


strcpy ( code_3 , 


"Met") ; 


break; 


case 


P: 


strcpy (code_3 , 


"Pro") ; 


break; 



default : strcpy (code_3, "???"); 

} 
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} 

/* This function returns the 1-letter code for the amino acid 
*/ 

void amino acid cod l{acid_label label, char code_l) 



{ 



switch 


(label) { 








case 




code 1 




G ; 


break 


case 


A: 


code 1 




A ; 


break 


case 


V: 


code 1 




V ; 


break 


case 




code 1 




# T f . 

Jj ; 


break 


case 


T . 

J. > 


code 1 




1 ; 


DreaK 


case 


C . 






S ; 


breajc 


case 


T - 






'T' ; 


break 


case 


T) • 
tJ * 






'D' ; 


break 


case 








' E' ; 


break 


case 








'N' ; 


break 


case 




L^LJUC X 




'Q' ; 


break 


case 


K: 


code_l 




'K' ; 


break 


case 


H: 


code_l 




'H' ; 


break 


case 


R: 


code_l 




'R' ; 


break 


case 


F: 


code_l 




/ p/ . 


break 


case 


Y: 


code_l 




'Y' ; 


break 


case 


W: 


code_l 




'W ; 


break 


case 


C: 


code_l 




'C ; 


break 


case 


M: 


code_l 




/ 


break 


case 


P: 


code_l 




'P' ; 


break 


default 


: code 


1 


= ' ? 





} 

) 

/* This function returns the acid label from the 1-letter amino 

acid code 

*/ 

acid_label amino_acid_code(char code_l) 

{ 

acid_label ret; 

switch (code^l) { 

case 'G' : ret « G; break; 

case 'A' : ret s A; break; 

case 'V : ret - V; break; 
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ij 




ret 

JU k« 




L; 


break 


was 


' T ' 




ret 




I ; 


break 


C&S6 


' S' 




ret 




S; 


break 


case 






ret 




T; 


break 


case 


'D' 




ret 




D; 


break 


case 


' E' 




ret 




E; 


break 


case 


'N' 




ret 




N; 


break 


case 


'Q' 




ret 




Q; 


break 


case 


' K' 




ret 




K; 


break 


case 


' H' 




ret 


= 


H; 


break 


case 


'R' 




ret 




R; 


break 


case 


' F' 




ret 




F; 


break 


case 


' Y' 




ret 




Y; 


break 


case 


' W 




ret 




W; 


break 


case 


'C 




ire u 




C; 


break 


case 


'M' 




ret 




M; 


break 


case 


'P' 




ret 




P; 


break 


default 




ret 




BAD; 



} 

return (ret) ; 

} 

MOLECUIiAR TOPOLOGY CREATION - PEPTIDE2.C 

/♦ The topology creation routines 

V 

#include "peptide, h" 

/* This routine initializes the bond connection table 
*/ 

void initial ize_connection_t able (int ♦*bond_table, 
n_atomE_total ) 

{ 

int 

for(i=0; i<n_atOTns__total ; i++) 
for(j«0; j <:MAX_BONDS ; j++) 
bond_table[il [j] = -l» 

} 
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/* This routine creates a connection table 
V 

void make_connection_ta]Dle (int **bond_table, int ♦table_num, 

rigid_unit ♦unit, rigid_unit ♦start) 

{ 

int i, ♦j, il, save[MAX_BONDS] ; 

il = unit->head.atotn_nuTn + ♦table_num; 

for ( j=unit->head.bond; *j -1; j++) { 

add_connection(bond_table, il, ♦ j+^table_num) ; 

add_connection(bond_table, ♦ j+^table_num, il) ; 

} 

for (isO; i<unit->n_bonds; i*-*-) { 

il = unit->bond[i] ->tail .atom_num + ♦table_num; 
for { j=unit->bond[iJ ->tail .bond; ♦j != -1; j++) { 

add_connection(bond_table, il, ♦ j+^table_num) ; 

add^conne ct ion (bond_ table, ♦ j +*table_num, il) ; 

} 

save[i] = unit->bond[i] ->tail .atom_nuTn + ♦table^num; 

) 

♦table_nuTn unit->n_atOTns; 

for (i=0; i<unit->n_bonds ; i++) ( 

il = unit->bond[i) ->next->head. atom^num; 

if (unit->bond ti] ->next != start) il += ♦table_num; 

add_connection (bond_table, save[i] , il) ; 

add_connection (bond_table, il, save [i] ) ; 

if (unit->bond [i] ->next •= start) 

make_connection_table (bond_tabl€, table_num, 
unit->bond [ij ->next, start) ; 
} 

} 

/* This routine adds a connection to the coimection taQDle 
*/ 

void add_connectiGn(int ♦♦bond_table, int il, int ±2) 

{ 

int *i, ♦j; 

for {iebond_tabletil] ; *i != -1; i++) ; 

for (j=bond_table[il) ; j<i; j++) if (*j i2) return; 

♦i = i2; 

} 
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/♦ This routine prints out the connection table 
*/ 

void print_connection_table (int ♦♦bond_table, int n_atoTns_total) 

{ 

int i , j ; 

for (i=0; i<n__atoins_total; i++) { 
printf("%5d ",i); 

for (j=0; j<MAX_BOKDS; j++) printf("%5d bond_table [i] [ j ] ) ; 
printf ("\n") ; 

} 

} 

/♦ This routine determines the torsional terms 

p is set the head pointer and it returns the tail pointer 

V 

void get_torsions (torsional ist **p, int **bond_table, int 
♦table_nuTn, 

atoin^liBt *atom, rigid_unit *unit, rigid^unit 

♦start) 
{ 

int i, save (MAX_B0NDS1 ; 

static torsion_list *q; 

static int i2, * j , *k; 

rigid_unit *new_unit; 

if (!*p) q = NULL; 

for {i*0; i<unit->n_bonds ; i++) 

saveti] = unit->bond[il ->tail .atotn^num + *table_num; 
♦table_num -^^ unit->n_atoms; 
for (i=0; i<unit->n_bonds ; i++) { 

new_uni t = iini t - >bond [ i ] - >nex t ; 

i2 = new_unit->head.atoin_nuin; 

if (new_unit != start) i2 ♦table^num; 

for (j =bond_t able [save [i] ] ; *j != -1; j-*-*-) 
for (k=bond_table[i2] ; *k !:= -1; k++) 
if (*j i2 && save[i] != *k) 
if (!*p) 

*p = q = add_torsion(bond_table, atom, * j , savefi] , i2, 

♦k) ; 

else 

if (q->next « add_t rsion(bond_tabl , atom, * j , 
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save[i] , x2 , *k) ) 

q = q->next; 
if (new_unit ! = start) 

get_torsions (p, bond_table, table_nuTn, atom, new_unit, 

start) ; 

) 

} 

/* This routine adds a torsion to the torsion list 

Wildcards on i and 1 (simultaneously) are allowed for 

V 

torsion_list *add_torsion (int ♦*bond_table, atom_list ♦atom, int 
ii int j , 

int k, int 1) 

{ 

torsion_list t, ♦v; 
char wild[] = ; 
int degen, itmp; 

/* count degeneracy for "general" torsions--don' t count the torsion 
axis! */ 

/♦ "specific" torsions have a degeneracy of 1, "general" have a 
degeneracy 

of degen ♦/ 

for (itmp^O; bond_table [ j ) [itmp] != -l; itmp++) ; 
for (degen=0; bond_table [k] [degen] != -1; degen++) ; 
itmp - - ; 
degen- - ; 
degen *= itmp; 
t. degen = i; 
/* printf (ns %s %s %s %d\n", 

atom[i] .p->name, atomtj] .p->name, 

atom[k] ,p->name, 

atom[l] .p->name, degen); */ 



t 


.next = 


NULL; 


t 


.num[0] 


* i; 


t 


.num[i] 


= j; 


t 


.nuin[2] 


= k; 


t 


.num[3) 


= 1; 



/♦ "specific" torsions ♦/ 

if ( !lookup_torsion_data(atom[i) .p->type, atom[jl .p->type. 
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atomtk] .p->type, 

atom[l] ,p->type, &t.p)) { 

/* "general" torsions */ 

if ( ! loo)cup_torBion_data ( wild, atOTn[j] ,p->type, 

atomlk] .p->type, 

wild, & t .p) ) { 
printf ("Torsional data not found for %s %s %s %s\n", 

atom(i] .p->type, atom[jl .p->type, 

atomtk] .p->type, 

atom[l] .p->type) ; 
return (NULL) ; 

} 

t.degen = degen; 
} 

/* only report nonzero torsional terms--this will screw up the 1/2 
factor 

for AMBER! */ 

/♦ if (t .p->vO [0] ==0 && t .p->vO [1] = = 0 SlSc t .p->vO [2] ==0) 

return (NULL) ; */ 

if ( (V = (torsion_list ♦) 

malloc (sizeof (torsion^list) ) ) == NULL) out_of ^memory ( ) ; 
*v = t; 
return (v) ; 

} 

/♦ This routine looks up the parameters for a torsional term in the 
torsion data base 

*/ 

logical lookup_torsion_data (string typel, string type2 , string 
types , 

string type4, torsion_data **p) 

{ 

torsion_data 

for (l=torsion_data_list,- *1; 1++) { 

if (Btrcir5>( (*1) ->typel, typel)==0 && strcmp ( (*1) ->typ 2 , 
Cype2)«s0 && 

Btrcmp( (*1) - >type3,type3)= = 0 && 
Strang) { (*l)->type4,typ 4)=»0) 
goto don ; 

if (strcinp( (*1) ->typel. typ 4)«=0 && 8trcti5)( (*1) ->type2, 
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type3)==o && 

strcTnp{ - >type3,type2)= = 0 

strcmp( (*1) ->type4, typel) ==0) 
goto done; 

} 

return (FALSE) ; 
done : ; 
*P = *1; 
return (TRUE) ; 

} 

/* This routine prints out the torsion terms 
*/ 

void print_torsions (torsion_list *list, atom^list ♦atom) 

{ 

torsion_list *t; 
double theta; 

for (t=list; t; t=t->next) 

{ 

theta = torsion{atora[t->num(0]] , position, 
atom[t->num[l] ] .position, 

atom(t->num(2]] .position, 

atom[t->num[3] ] .position) ; 

printf("%4-s %4-s %4-s %4-s" ,atom(t->num[0] ] .p->name, 

atom[t->num[l] ] -p->name, 
atom[t->num[2] ] .p->name, 
atom[t->num[3] ] .p->narae) ; 

/* printf("%4-d %4-d %4-d %4-d" , t->num[0] , t->num[l], 

t->num [2] , 

t->num[31 ) ; */ 

printf("%4d ",t->degen); 

printf ("%9.31f %7.31f %7.31f %7.31f %7.31f %7.31f %7.31f\n", 
180.0*theta/PI, 

t->p->vO [0] , t->p->vO II] , t->p->vO [2] , 
180.0*t->p->phi0 [0] /PI, 180.0*t->p->phiO [1] /PI, 
180 . 0*t • >p->phiO [2] /PI) ; 

} 

} 

/♦ This routine determines the torsional angle (in radians) defined 
by the 
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input positions --bonded in the order pl-p2-p3-p4 

*/ 

double torsion {vector pi, vector p2, vector p3, vector p4) 
{ 

vector bl, b2, b3 , nl, n2; 

double dot, len, theta; 
/* define bond vectors */ 

b3-x = pl-x - p2.x; b3.y = pl.y - p2.y; b3.2 = pi. 2 - p2,z; 

b2.x = p3.x - p2.x; b2.y = p3 .y - p2.y; b2.z = p3.z - p2.z; 

bl.x = p4.x - p3.x; bl*y = p4.y - p3.y; bl.z = p4.2 - p3.z; 

b2 = vector_scale {b2, 1.0); 

dot = vector_dot (bl,b2) ; 
/* project bonds onto torsion axis */ 

nl.x = bl.x - dot*b2.x; nl.y = bl.y - dot*b2.y; nl.z = bl.z - 
dot*b2,z; 

dot = vector_dot (b3 ,b2) ; 

n2.x = b3.x - dot*b2.X; n2.y = b3.y - dot*b2.y; n2.2 = b3.z - 
dot*b2 .z; 

len = vector_length(nl) *vector_length(n2) ; 
theta = vector_dot (nl , n2 ) /len ; 
/♦ watch out for theta=:0,PI, which kill acos */ 
if (theta > 1.0-EPS) 

theta = 0.0; 
else if (theta < -1.0+EPS) 

theta = PI; 
else 

theta = acos (theta); 
/♦ get proper sign on angle */ 
nl = vector_cross (n2, nl) ; 

if (vector_dot (nl, b2) < 0.0) theta = -theta; 
return (theta) ; 

} 

/* This function assigns the lennard jones parameters 
♦/ 

void assign_lj_parameters (rigid_janit ♦unit, rigid__unit ♦start) 

{ . 
int i; 

f r (i»0; i<unit->n__atoms; i*-*-) { 

if { !loo)cup_lj_data(unit->atomtil - type, &unit->atoTn[i] .ri, 
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&unit->atom[i] .ei) ) { 
printf ( "Lennard" Jones parameters not found for atom %s\n", 

unit->atom[il .type) ; 

exit (1) ; 

} 

} 

for {i=0; i<unit->n_bonds; i++) 
if (unit->bond [ij ->next != start) 

assign_lj_parameters (unit->bond [i] ->next, start) ; 

} 

/♦ This function looks up the lennard jones parameters for an atom 
V 

logical lookup_lj_data (string type, double *ri, double *ei) 
{ 

lj_data **1; 

for {l:rlj_data_list ; *1; 1++) 

if {strcmp( (*1) •>type, type)==0) { 
♦ri = (♦l)->ri; 
*ei = (♦I) ">ei; 
return (TRUE) ; 

} 

return (FALSE) ; 

} 

/* This routine determines the H-bonds that are in the molecule 
*/ 

void get_hbonds(hbond_list **list, atom_list *atoin, int n_atoms) 
{ 

int i,j; 

hbond_list t, *u, ♦v; 
♦list = NULL; 
t.next = NULL; 
for (iasO; i<n_atoms; i++) 
for (j=i+i; j<n_atoms; j++) 

if (loo)cup_hbond_data(atoin[i] .p->type, atomtj] .p->type, 
&t.p)) { 

t.numtO] = i; 
t.numtl] = j; 
if ( (V = (hbond_list *) 

inalloc(si2eof (hbond_liet))) =« NULL) out_of _memory ( ) ; 
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♦v « t; 
if (»*list) 

♦list = u = V; 
else { 

u->next = V; 

u = u->next; 

} 

) 

} 

/* This function looks up the H-bond parameters for an atom pair 
*/ 

logical loo)cup_hbond_data (string typel, string type2, hbond_data 
•♦p) 

{ 

hbond_data **1; 

for (l=hbond_data_list; *1; 1++) { 

if (strcmp( (♦I) ->typel, typel)==0 && strcmp ( (*1) ->type2 , 
type2)==0) 

goto done; 

if (strcmp { (*1) ->type2, typel) ==0 && strcmp ( (*1) ->typel, 
type2)==0) 

goto done; 

} 

return (FALSE) ; 
done : ; 

♦p = *1; 

return (TRUE) ; 

} 

/* This function prints out the H-bonds 
*/ 

void print_hbonds (hbond_list *1, atom_list ♦atom) 

{ 

for (; 1; l=l->next) { 
printf(''%s %8 %lf %lf\n", 

atoin[l->num[0] ] .p->name, atomIl->num(l] ) .p->name, l->p->a, 
l->p->b) ; 

) 

} 

/* This function assigns the atom pointers 
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void assign_atoin_pointers (int *list_num, rigid_unit ♦unit, 
rigid_unit * start, 

atom^list ♦atom) 

{ 

int i; 

for (i=0; i<unit->n_atoins ; i++) atom [i+* lis t^num) .p 
&unit->atom[i] ; 

♦list_num += unit->n_atoms; 
for (i»0; i<unit->n_bonds; i++) 
if (unit->bond [ij ->next != start) 

assign_atom_jpointers (list_num, unit- >bond [i] ->next, start, 

atom) ; 

} 

GEOMETRY CREATION ROUTINES - PEPTIDES . C 

/* The geometry creation routines 

V 

#include "peptide .h" 

logical grow^backwards « FALSE; 

/* This function creates the Rosenbluth factor for an old 

configuration 

V 

void old_unit(int *list_num, int nO, int nl, int n2, double 
♦logrosen, 

rigid_unit *unit, rigid_unit ♦start, torsion_list ♦t, 
hbond_list ♦!, atom_list ♦atom, vector ♦twig[J, 

vector pO, 

vector bO) 

{ 

int i, j; 

vector plMAX_BONDS] , b [MAX_BONDS) , pi, bl; 
double e; 

pi = iinit - >a torn (unit ->h ad.atom_nurnJ .position; 
bl = \init->head.axis; 

do_\init_sub (list_num, nO, nl, n2, logrosen, unit, t, 1, atom. 
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twig, 

pi, bl, pO, bO, See, p, b, FALSE); 
for (j=0; j<unit->n_bonds; j++) 
if (unit->bond[j] ->next 1= start) 

old^unit {liet^num, nO, nl, n2, logrosen, unit->bond[j] ->next, 

start, 

t, 1, atom, twig, p[j], b[j]); 

} 

/* This function creates the geometry of a peptide 

and the Rosenbluth factor. The growth is in one direction. 

*/ 

void do_unit(int *list_num, int nO, int nl, int n2, double 
♦logrosen, 

rigid_unit *unit, rigid_unit ♦start, torsion_list ♦t, 
hbond^list ♦I, atom_list *atom, vector *twig[] , vector 

pO, 

vector bO, double *e) 

{ 

int i , j ; 

vector p[MAX_BONDS] , b [MAX_BONDS] , pi, bl; 
unit->list_num = ♦list^num; 

pi = unit- >atom [unit ->head.atom_num) .position; 
bl = unit- >head. axis; 

do_unit_sub (list_nuTn, nO, nl, n2 , logrosen, unit, t, 1, atom, 
twig, 

pi, bl, pO, bO, e, p, b, TRUE); 
/♦ loop over remaining units */ 

for (j=0; j<unit->n_bondB; { 
/♦ store side* chain regrowth info ♦/ 
if (unit->bond [j 3 ->next != start) 

do_unit (list_num, nO, nl, n2, logrosen, 

unit->bond(jl ->next, start, t, 1, atom, twig, p[j], 

btj], e); 
} 

} 

/* This function creates the geometry of a peptide 
and the Rosenbluth factor. The growth is forward. 

V 

void do backbone_f (int i, int n_main, int n_atoms_total, 
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double ♦logrosen, 
regrowth ♦main, regrowth *side, 
torsion^list *t, hbond^list *1, 
atoTn_list ♦atom, vector ♦twigL], 
double *e, logical new) 

{ 

int list_num, nl, n2; 

vector p[MAX_BONDS] , b [MAX^BONDS] , pi, bl, pO, bO ; 
if (i==0) i++; 

pO = get_main_pO (atom, main, i) ; 
bO = get_main_bO (atom, main, i) ; 
main i; 

list_num = main->unit->list_num; 
nl = n2 = n_atoms_total; 
/* loop over backbone groups ♦/ 
for {; i<:n_main; i++, main++) { 

pi = main->unit->atom[main->unit->head.atom_num] .position; 
bl = main ->unit->head. axis; 
/* add on backbone unit */ 

do_unit_sub(&list_num, 0, nl, n2, logrosen, main->unit, t, 1, 
atom, twig, 

pi, bl, pO, bO, e, p, b, new) ; 
if (!new ScSl i < n_main-l) { 

pO = get_main_pO (atom, main, l) ; 

bO = get_main_bO (atom, main, 1); 
} else if (new && i < n_main-l) { 

pO = p [main->unit->n_bonds-l3 ; 

bO = b tinain->unit->n_bonds-ll ; 

} 

/* add on side chain */ 

if (main->unit->n_bonds ==2) ( 
if (new) 

do_unit (&list_num, 0, nl, n2, logrosen, 

main->unit->bondCOl ->next, 
main->unit->bond to] ->nGxt, 

t, 1, atom, twig, p[OJ, b[01, e) ; 

else 

old_unit (&list_num, 0, nl, n2, logrosen, 

main->unit->bond. to] - >next, 
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main->unit->bond[Ol ->next, 

t, 1. atom, twig, p[0], b[0]); 

} 

) 

} 

/* This function creates the geometry of a peptide 
and the Rosenbluth factor. The growth is forward. 
Side chains are rigidly rotated. 

void do_bac)cbone_f_rigid(int i, int n_tnain, int n_atoms_total , 

double *logrosen, 
regrowth ♦main, regrowth *side, 
torsion^list *t, hbond_list *1, 
atom_list *atom, atom_info *atom_tmp, 
vector *twig [) , 
double *e, logical new) 

{ 

int list_num, nl, n2; 

vector p[MAX_BONDS] , b [MAX_BONDS) , pi, bl, pla, bla, pO, bO; 
logical false=FALSE; 

int n_atoms , j ; 
atom_info ♦q; 
do\ible len; 

vector b2 [MAX^BONDS] , v, v2 ; 
if (i==0) i++; 

pO = get_main_jpO (atom, main, i) ; 
bO = get_main_bO (atom, main, i) ; 
main +« i; 

list_num = main->unit->list_num; 

nl = n2 = n_atoms_total ; 
/* get first head vector */ 

pi = atom[main->unit->list_num + 
main- >unit- >head . atom_num] , position; 

bl = atom[main [-1] .unit->list_num + 

main[-l] . unit ->bond [main [-1] •unit->n_bonds-l] ->tail .atom_num] 
.position; 
bl.x ' pl.x - bl.x; 
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bl.y = pl.y - bl.y; 
bl . 2 = pi . 2 - bl . 2 ; 
for (; i<n_main; tnain++) { 

/* change unit */ 

n_atoms = main->unit->n_atotns ; 
q = main->unit->atOTn; 
if (i < n_main-l) 

main- >unit->n_atoms = main [1] .unit- >list_num 
main - >uni t - > 1 i s t_num ; 

main->unit->atom = atom_tmp; 

for (j=0; j<main*>unit->n_atoms; j-t-^) 

main->unit->atom[ j ] .position = atom [lis t^num^j ] .position; 
for {j=0; j<main->unit->n_boncis; j^-^-) { 
b2[j] = main->unit->bond I j ) ->tail .axis; 
V = atom(main->unit->bond I j] ->next->list_num + 

main->unit->bond [j] ->next->head.atom_num] .position; 
v2 = atom [main- >unit->list_num + 

main->unit->bond [ j ] - >tail .atom_num] .position; 
V , X - = v2 . X ; 
V . y - = v2 . y ; 
V . z - = v2 . 2 ; 

main->unit->bond [ j ] ->tail .axis = vector_scale (v, 1.0) ; 

} 

/♦ get next head vector ♦/ 
if (i < n_main-l) { 

pla = atom [main [1) .unit- >list_num + 

main[l] .unit->head.atom_num) .position; 
bla = atom [main- >unit->list_num + 

main->unit->bond [main->unit->n_bonds-l] ->tail .atom^numl 
.position; 
bla.x = pla.x - bla.x; 
bla.y K pla.y - bla.y; 
bla.z = pla. 2 - bla. 2; 

} 

/♦ add on unit */ 

do_unit_sub (&list_num, 0, nl, n2, logrosen, main->xinit, t, 1, 
atom, twig, 

pi, bl, pO, bO, e, p, b, n w) ; 
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/♦ change unit back */ 

main->unit->n_atotns = n_atoms; 

main->unit->atom = q; 

for (j=0; j<tnain->unit->n_bond6; j++) 

main->unit->bondtjl ->tail.axiB = b2[jl; 
/♦ change head vector */ 

if {!new i < n_Tnain-l) { 

pO = get_main_pO (atom, main, 1) ; 

bO = get_main_bO (atom, main, 1) ; 
} else if (new i < n_main-l} { 

pO = p[main->unit->n_bonds-l] ; 

bO = b [main->unit->n_bonds-ll ; 

} 

pi = pla; 
bl = bla; 

} 

) 

/* This function creates the geometry of a peptide 
and the Rosenbluth factor. The growth is backward. 

♦/ 

void do_backbone_b (int i, int n_main, int n_atoms_total, 

double *logrosen, 
regrowth ♦main, regrowth *side, 
torsion_list ♦t, hbond^list *1, 
atom_list *atom, vector *twig[], 
double *e, logical new) 

{ 

int list_num, nO, nl, n2, n_bonds; 

vector plMAX_BONDS] , b [MAX^BONDS] , bO, pO, trap, pi, bl; 

if (i == n_main-l) i--; 

main +:= i; 

n2 = n_atoms_total; 

bO = get_inain_bO (atom, main, 1) ; 

for (; i>=0; i--, main--) { 
nl = main II] .unit->list_num; 
nO = list_num = main->unit->list_num; 
/* get bond vectors */ 

p 0 

atom[main[ll .unit->head.atom_num+roaintl] .unit->list_num] .position; 
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bO.x = -bO.x; bO.y = -bO.y; bO^z = -bO.z; 
n_bonds = main->unit->n_bonds ; 

pi = main- >unit->atOTn [main- >unit- >bond [n_bonds-l] -> 

tail *atom_num] •position; 
bl = main->unit->bond[n_bonds-l] ->tail .axis; 
bl,x = -bl.x; 
bl.y = -bl,y; 
bl.2 = -bl.z; 

bl = vector_scale(bl, vector_length (main [l] .unit->head. axis) ) ; 
ttnp = main->unit->bond [n_bonds-l] ->tail .axis; 
main->unit->bondfn_bonds-l] ->tail.axis = main ->unit->head. axis; 
/♦ add on unit ♦/ 

grow_backwards = TRUE; 

do_unit_sub(&list_num, nC, nl, n2, logrosen, main->unit, t, 1, 
atom, twig, 

pi, bl, pO, bO, e, p, b, new); 
grDw__bac}cwards = FALSE; 

main->unit->bond[n_bonds-l] ->tail .axis = tmp; 
/* change head vector */ 
if { !new && i > 0) 

bO = get_main_bO (atom, main-l, i); 
else if (new i > o) 

bO = vector_scale(btn_bonds-ll , 1.0); 
/* add on side chain */ 

if (main->unit->n_bonds ==2) { 
if (new) 

do_unit (&list_num, nO, ni, n2, logrosen, 

main->unit->bo n d [ 0 ] - > n e x t , 
main - >unit - >bond [ 0 ] - >next , 

t, 1, atom, twig, ptO], b[01, e) ; 

else 

old_unit {fitlist_num, nO, nl, n2, logrosen, 

main->unit->bond[01 ->next, 

main->unit->bond[0] ->next, 

t, 1, atom, twig, pIO] , btO]); 

} 

} 

} 

/♦ This function creates the geometry of a peptide 
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and the Rosenbluth factor. The growth is backward. 
Side chains are rigidly rotated. 

V 

void do_backbone_b_rigid(int i, int n^main, int n_aton\s_total, 

doxible ♦logrosen, 
regrowth ♦main, regrowth *side, 
torsion_list ♦t, hbond^list *1, 
atoTn_list *atom, atom^info *atOTn_tmp, 

vector ♦twig [] , 

double *e, logical new) 

{ 

int list^num, nO, nl, n2, n_bonds, n_atoms, j; 

vector p(MAX_BONDSl , b [MAX_BONDS] , bO, pO, tmp, pi, bl , pla, bla, 

b2 [^4AX_B0NDS] , v, v2 ; 
logical f alse=FALSE ; 
atom_info ♦q; 
if (i n_main-l) i--; 
main i; 
n2 = n_atoms_total ; 
/♦ get first head unit ♦/ 

pi = atom [main- >unit->bond [main->unit->n_bonds-l] ->tail .atom_num 

■f 

main->unit->list_num] .position; 
bl = atom[main[l).unit->list_num + 
main[l) .unit->head,atom_num] . posit ion ; 
bl.x = pl.x ' bl.x; 
bl.y = pl.y - bl.y; 
bi,2 = pl.z - hl.z; 
bO = get_main_bO (atom, main, 1); 
for (; i>«0; i--, main--) { 
/* get current info */ 

list_num = main->unit->list_num; 
n_bonds = main->unit->n_bonds; 

p 0 

atom(main[ll -unit->head.atom_nuTO+main[ll .unit->list_numl .position; 
bO.x = -bO,x; bO.y = -bO.y; bO.z = -bO.z; 
nl = mainll] ,unit->li6t_num; 
nO = list_num = main->unit->list_num; 
n atoms = main->unit->n_atoms; 
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q = main->unit->atom; 
/* change current unit */ 

Tnain">unit->n_atoms = nl - nO; 

main->unit->atorn = atom_tmp; 

for {jsO; j<main->unit->n_atoms; j++) 

inain->unit->atom[j J .position = atom [list_nuTn+j ] .position; 
/♦ compute bond axes ♦/ 

for {j=0; j<n_bonds; j++) { 

b2[j] = main->unit->bond[j] ->tail .axis; 

V = atom [main*>unit->bond [j ] ->next-> lis t_num + 

main->unit">bond[j] •>next->head. atom_num] .position; 
v2 = atom[list_num + 

main->unit->bond [ j ) ->tail . atom_num) .position; 
v,x -= v2 .X; 
v.y -= v2 .y; 
V . 2 * = v2 . z ; 

main- >uni t - >bond [j] ->tail .axis = vector_scale (v, i . 0) ; 

} 

main->unit->bond [n_bonds-l] ->tail .axis = 

vector_scale (get_main_bO (atom, main-i, i) . 

vector_length (main- >unit->head. axis) ) ; 
/* compute new head vector */ 
if (i > 0) { 

p 1 a s= 
atom(mainl-l) .unit->bond[mainI-ll .unit->n_bonds-l] ->tail .atom_num+ 
main[-l] .unit->list_num] .position; 
blaeatom[list_num + main->unit->head.atom_num] .position; 
bla.x = pla.x - bla.x; 
bla.y = pla.y - bla.y; 
bla.2 = pla.2 - bla.2; 

) 

/* add on unit */ 

grow_backwards = TRUE; 

do_unit_sub(&list_num, nO, nl, n2, logrosen, Tnain->unit, t, 1, 
atom, twig, 

pi, bl, pO, bO, e, p, b, new); 
grow_backwards = FALSE; 
/* restore backbone unit */ 

main->unit->n_atoms = n_atoms; 
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main->unit->atom = g; 

for (j=0; j<n__bonds; j++) { 

tnain->unit->bond[j] ->tail.cDcis = b2lj); 
/* change head vector ♦/ 
if ( !new SlSl ± > 0) 

bO = get_main_bO (atom, main-l, 1) ; 
else if (new && i > 0) 

bO = vector_Bcale (b [n_bonds-l] , 1.0); 
pi = pla; 
bl = bla; 
} 

} 

} 

/* This routine creates the random positions. 

For new units, it picks and copies the winner. 

void do_unit_sub (int *list_nutn, int nO, int nl, int n2 , do\ible 
♦logrosen, 

rigid_unit *unit, torsion_list *t, hbond_list *1, 
atom_list *atoTn, vector *twig[] , vector pi, vector 

bl, 

vector pO, vector bO, double *€, vector 

p[MAX_BONDS] , 

vector b tMAX_BONDS] , logical new) 

{ 

int i, j ,iO; 

vector bond[KMAX] [MAX_B0NDS1 , point [KMAX] [MAX^BONDS] ; 

double ftn^), cos_theta2, sin_theta2; 

double de[KMAX], sum, max; 

iO « 0; 

if (!new) { 

/♦ copy old configuration to first "guess" */ 

iO = 1; 

for (j«0; j <unit->n_atoms ; j++) 

twig[0] [j] = atomf*list_num + j] .position; 

}. 

/* create gues s for new unit position */ 
for (i=iO; icKMAX; i++) { 
do { 
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cos_theta2 = l-2*ran2 (1 . 0) ; 
sin_theta2 = l-2*ran2 (1 . 0) ; 

ftmp = cos_theta2*cos_theta2 + sin_theta2*sin_theta2 ; 
} while (ftmp > 1.0); 
ftmp = sgrt(ftrap); 
cos_theta2 /= ftmp; 
sin_theta2 /= ftTt^j; 

add_rigid_unit (unit, twig[i], pi, bl, 

pO, bO, point [i] , bondti], 
cos_theta2, sin_theta2) ; 

) 

/* calculate probabilties be careful about zero of energy & 
overflows ♦/ 
max = -1E99; 

for (j=0; j<KMAX; j++) { 

de(j] = -BETA * delta_energy { t , 1, atom, twig[j], *list_num, 
nO, nl, n2, 

unit->n_atoms) ; 

if (de[j] > mcLx) max = de[jl; 

} 

sum =0,0; 

for (j=0; j<KMAX; j++) { 
de[j] = exp(de(j] - max); 
sum ^= de [j] ; 

} 

♦logrosen += log(sum) + max - log(KMAX); 
if (!new) { 
/* determine points */ 

for (j=0; j<unit->n_bonds ; j++) { 

p[j] = atom[*list_num + 
\init->bond [ j ] ->tail .atom_num] .position; 

b[j] atom[unit->bond [ j] ->next->list_num + 

unit->bond [ j ] ->next->head.atom_num] .position; 
b[jl .X -= ptjl .x; 

i>Ij] -y P[j3 .y; 
b[j] ,2 p[j] .2; 
b[j] = vector_scale (b [j ] , 1.0); 

} 

♦liBt_num += \init->n_atoms; 
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} else ( 
pick winner */ 
de[0] /= sum; 

for (j=l; j<KMAX; j-f-f) de[j] = de[j-ll + de[j]/suTn; 
f tmp = ran2 (1.0) ; 

for (i=0; icKMAX; i++) if (ftmp c= de[i]) break; 
f tmp = de [i] ; 

if (i > 0) ftmp -= de[i-l]; 
ftmp *= sum; 

♦e -= (log (ftmp) +max) /BETA; 
/* copy winner to atom array */ 

for (j=0; j<unit->n_atoms; (♦list_num) ++) 

atom[*list_num] .position = twig[i] [j] ; 
for (j=0; j<unit->n_bonds; j++) { 
p [j] = point ti] [j] ; 
b[j] = bond[i] [j] ; 

} 

} 

} 

/* This routine adds a rigid unit to the peptide structure 
♦/ 

void add_rigid_unit {rigid_unit ♦unit, vector *pos, 

vector pi, vector bl, vector pO, 
vector bO, vector point [MAX_BONDS] , 
vector bond [MAX_BONDS] , 
doxible cos_theta2, double sin_theta2) 

{ 

int i; 

doxible bond_len, cos_theta, sin_theta; 
vector n, rO; 

bond_len = vector_length(bl) ; 
rO.x = pO.x + b0.x*bond_len; 
rO.y = pO.y + bO .y*bond_len; 
rO.2 = pO.2 + bO . 2*bond_len; 
bl.x /= bond_len; 
bl.y /= bond_len; 
bl.z bond_len; 
n « vector_crosB (bl,bO) ; 
cos_theta = vector dot{bO,bl); 
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sin_theta = vector_length(n) ; 
if {sin_theta < EPS) { 

n.x = 1.0; 
} else { 

n.x /= sin_theta; 

n.y /= sin_theta; 

n,2 /= sin_theta; 

} 

for (i=0; i<unit->n_atOTns; i++) 

pos[i] = align (unit->atom[i] .position, rO, pi, 
n, cos_theta, sin_theta, 
bO, cos_theta2, sin_theta2) ; 
for (i=:0; i<unit->n_bonds; i++) 

pointfi] = pos [unit->bond [i] ->tail . atom_num] ; 
rO,x = 0.0; rO.y = 0.0; rO.z = 0.0; pl:=rO; 
for {i=0; i<unit->n_bond5; i++) 

bond[i] = align(unit->bond [i] ->tail .axis, rO, pi, 

n, cos_theta, sin_theta, 
bO, cos_theta2, sin_theta2) ; 

} 

/* This routine aligns the position 
V 

vector align (vector p, vector rO, vector rl, vector n, double 
cos theta, 

double sin_theta, vector n2, double cos_theta2, 
double sin_theta2) 

( 

vector ret; 
ret.x = p.x - rl.x; 
ret.y = p.y - rl.y; 
ret .2 = p. 2 - rl-2; 

ret =t vector_rotate (ret , n, cos_theta, sin_theta) ; 
ret = vector_rotate (ret , n2, cos_theta2, sin_theta2) ; 
ret.x += rO.x; 
ret.y += rO.y; 
ret. 2 rO.2; 
return (r t) ; 

} 
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*♦*♦**♦♦♦♦♦*♦★**★♦♦♦♦♦♦★*♦**♦**♦*♦*♦♦♦•♦**♦♦*♦♦*♦♦♦**♦♦**•*•*•♦** 

ENERGY DETERMINATION - PEPTIDE4 . C 
♦*♦*♦*♦**•♦♦«*♦♦***♦♦*♦♦♦♦♦♦♦*****♦*♦******♦****************•**** 

/* The energy routines 

*/ 

#include "peptide. h" 
#de£ine NO 8 
#define Nl 11 
#define N2 81 
#define N3 84 
#define N2 63 
#define N3 66 
#define SCALE 100 

/♦ This energy routine tries to force a S-S ring-closure for 

CAAAAAAC 

*/ 

double zenergy {torsion_list ♦t, hbond_list *1, atom_list *atom, 
int n_atoms_total) 

{ 

double rl, r2; 
vector X, y, v; 
X = atom[Nl] .position; 
x.x -= atomINO] . position, x; 
x.y atomlNO] . position. y; 
X.2 -= atom [NO] .position. z; 
x = vector_scale (X, 2.038); 
x.x += atom[NO] .position. x; 
x.y atom[NO) . position. y; 
X.2 atom (NO] .position. 2; 
y = atom[N31 .position; 
y.x -= atom[N2] . position. x; 
y.y -= atom[N2l . position. y; 
y.2 atom [N2] .position. 2; 
y = vector_scale (y, 2.038); 
y.x += atom[N2] . position. x; 
y.y -1"= atom[N21 , position. y; 
y.2 += atom tN2] .position, 2; 
V = X; 
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v.x atOTn[N2] . position. x; 
v.y -= atOTn[N2] . position. y; 
V.2 -= atom [N2] .position. 2; 
rl = vector_length2 (v) ; 
V = y; 

v.x atom [NO J . position. x; 
v.y -= atom[NO] . position. y; 
v.z atomtNO] .position. 2; 
r2 = vector_length2 (v) ; 
return (SCALE* (rl+r2) /BETA) ; 

} 

/* This energy routine tries to force a S-S ring-closure for 

CAAAAAAC 

*/ 

double zdelta_energy (torsion_list *t, hbond^list *1, atom_list 
*atom, 

vector *twig, int n_atoms, int nO, int nl, int 

n2 , 

int n_twig) 

{ 

double rl, r2 ; 
vector X, y, v; 
rl = r2 = 0.0/ 

if (INTERVAL (NO, n_atoms, n_atoms+n_twig) 
INTERVAL (N2, nl, n2) ) { 
X = twiglNl-n_atoms] ; 
x.x -= twig[NO-n_atoms] .X; 
x.y -=r twig [NO-n_atoms] ,y/ 
x*2 -= twigtNO-n_atomsl .2; 
X = vector_scale(x, 2.038); 
x.x += twig[NO-n_atoms] .X; 
x.y += twig[NO-n_atoms] .y; 
X.2 twig[NO-n_atoms] . 2; 
y = atom[N3] .position; 
y.x atom[N2] .posit ion, x; 
y.y atom[N2] .posit ion. y; 
y.z -= atom tN2] .position. 2; 
y = vector_scale(y, 2.038); 
y.x += atom[N2] . position. x; 
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y.y atomlN2l . position, y; 
y.2 += atom (N21 .position. 2; 

V = X; 

v.x -= atoTn[N21 . position. x; 
v.y -= atom[N2l . position .y; 
V.2 -= atomIN2] .position. 2; 
rl = vector_length2 (v) ; 

V = y; 

v.x twig {NO-n_atoms] .x; 
v.y - = twig iNO-n^atoms] .y; 
v.2 -= twig [NO-n_atomB] . 2 ; 
r2 = vector_length2 (v) ; 
} else if (INTERVAL (N2, n_atoms, n_atoms+n_twig) 
INTERVAL (NO, nO, n_atOTns) ) { 
x = atom[Nl] .position; 
x.x -= atom[NO] . position. x; 
x.y -= atoTn[NO] . position. y; 
X.2 atomtNO] .position. 2; 
X = vector_scale (x, 2.038); 
x.x += atoTn[NOJ . position. x; 
x.y -»■= atom[NO] . position. y; 
x.2 +s atom [NO] .position. 2; 
y = twig [N3-n_atoms] ; 
y.x -= twigIN2-n_atoms] .x; 
y.y twig [N2-n_atomsl .y; 
y.2 -= twig lN2-n_atoms] . 2 ; 
y = vector_scale (y, 2.038); 
y.x += twigtN2-n_atoms] .X; 
y.y += twig (N2-n_atoms] .y; 
y.2 += twig[N2-n_atoms] .2; 
v = X; 

v.x - = twig [N2-n_atomsJ .x; 
v.y - = twig[N2-n_atoms) ,y; 
v.2 -= twig[N2-n_atomsJ .2; 
rl = vector_length2 (v) ; 

V = y; 

v.x -= atom[NO] . position. x; 
v.y -= atom[NO] . position. y; 
v.2 -= atom [NO] .position. 2; 
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r2 ^ vector_length2 (v) ; 

} 

return (SCALE* (rl+r2 ) /BETA) 

} 

/* This routine returns the Coulomb, LJ, H-bond, and torsion 
energies 

between the atoms in *atOTn and the atoms in *twig. 
The atoms in ♦twig must be those directly following those in 
♦atom. 

The atoms n_atoms to n_atoms+n_twig are in twig. 
The atoms nO to n_atoms and nl to n2 are in atom. 
nO <= n_atoms <= nl n2 

*/ 

double delta_energy (torsion_list ♦t, hbond_list ♦I, atom_list 
♦atom, 

vector ♦twig, int n_atoms, int nO, int nl, int 

n2, 

int n^twig) 

{ 

return ( 

d_nonbond_energy (t , atom, twig, n_atoms, nO, nl, n2, 

n^twig) + 

d_hbond_energy (1 , atom, twig, n_atoms, nO, nl, n2, n_twig) 
d_torsion_energy (t , atom, twig, n_atoms, nO, nl, n2, 

n_twig) 

); 

} 

/♦ This routine returns the total energy 
*/ 

double energy (tors ion_list ♦t, hbond_list ♦I, atom_list ♦atom, 
int n_atoms_total) 

{ 

return ( 

nonbond_energy (t, atom, n_atoms_total) + 
hbond_energy (1 , atom) + 
torsi on_ene r gy ( t , a t om ) 

); 
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/♦ This routine returns the Coulomb and LJ energies 
between the atoms in ♦atom and the atoms in *twig. 
The atoms in *twig must be those directly following those in 

♦atom, 

*/ 

double d_nonbond_energy( torsional ist ♦t, atom_list ♦atom, vector 
♦twig, 

int n_atoms, int nO, int nl, int n2, int 

n_twig) 

{ 

#define FACT 332.06 /♦ converts from ei ej/rij to Kcal/mol ♦/ 
int i, j, k; 
vector r; 

double r2, r6, e, ei j , ri j , rij3, term, a, b; 
e - 0.0; 

for (i=nO; i<n2; i++) { 

if ( INTERVAL ( i , n_atoms , nl ) ) continue ; 
for (j=0; j<n_twig; j++) { 

r.x = atom[i] .position. X - twig[j].x; 

r.y = atom{i] .position. y - twig[j].y; 

r.z = atom [i] .position. 2 - twig[j].z; 

r2 = vector_length2 (r) ; 

r6 = r2*r2^r2; 

eij = sqrt (atom[i] .p->ei ♦ atom [n_atoms+ j ] .p->ei) ; 

rij = 0.5^ (atom[i] .p->ri + atom [n_atoms+j ] .p->ri) ; 

rij3 = rij ♦rij ♦rij; 

a = eij ♦ rij3^rij3^rij3^rij3; 

b = 2^eij ♦ rij3^rij3; 
/♦ epsilon = 4^r ♦/ 

term = FACT ♦ atom[il ,p->charge ♦ atomln_atoms+j] .p->charge 
/ {4^r2) 

+ a/{r6^r6) - b/r6; 
e term; 

} 

) 

/♦ subtract off 1/2 of 1-4 interactions ♦/ 
for (; t; t=t->next) 

{ 

i = t->nuint01 ; j = t->nuin[3] ; 
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if ( INTERVAL (i , n^atotns , n_atoTns+n_twig) ) { 
k = i; 

i = j; 
j = k; 

} 

i f ( INTERVAL (j ,n_atoms,n_atoms+n_twig) && 
(INTERVAL (i,nO,n_atoms) | | 

INTERVAL (i, nl, n2))) { 
r.x = atom [i] .position. X - twig [ j -n_atoms] .x; 
r.y = atom [i] .position./ - twig[j-n_atomsl .y; 
r.2 = atomli] .position. 2 - twig[j-n_atoms] .2; 
r2 - vector_length2 (r) ; 
r6 = r2*r2*r2r 

eij = sqrt (atom[i] .p->ei * atom[ j ] .p->ei) ; 

rij = 0.5 * (atom[iJ .p->ri + atom[j] .p->ri) ; 

rij3 = rij*rij*rij; 

a ^ eij * rij3*rij3*rij3*rij3; 

b = 2*eij * rij3*rij3; 

term = FACT * atom[i] .p->charge * atomlj] .p">charge / (4*r2) 

+ a/(r6*r6) - b/r6; 
e -= 0.5 * term; 

} 

} 

return (e) ; 
#undef FACT 

) 

/* This routine returns the Coulomb and LJ energies 
V 

double nonbond_energy (torsion_list ♦t, atOTn_liBt *atom, int 

n_atoms_total ) 

{ 

ttdefine FACT 332.06 /* converts from ei ej/rij to Kcal/mol ♦/ 
int i, j; 
vector r; 

do\ible r2, rG, e, eij, rij, rij3, term, a, b; 
e = 0.0; 

for (i=0; i<n_atoms_total ; i++) 

for (j=i+i; j<n_atomE_total,- j++) { 

r.x = atom [ij .position. X - atomtj) .position. x; 
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r.y = atom[i] . position. y - atom[j] .position./; 
r.z = atOTn[i] .position. 2 - atom [j ] .position, z ; 
r2 = v€Ctor_length2 (r) ; 
r6 = r2*r2*r2; 

eij = sqrt {atoTn[i] .p->ei * atomlj] .p->ei) ; 
rij = 0 .5* (atom[i] .p->ri + atom [ j ] .p->ri) ; 
rij3 = rij*rij*rij ; 
a = eij * rij3*rij3*rij3*rij3; 
b = 2*eij * rij3*rij3; 
/* epsilon = 4*r ♦/ 

term = FACT * atom[i] .p->charge * atom[j] .p->charge / (4*r2) 

+ a/(r6*r6) - b/r6; 
e term; 

} 

/* subtract off 1/2 of 1-4 interactions */ 
for (; t; t=t->next) 

{ 

i = t->num[0] ; j = t->num[3] ; 

r.x = atom[i] , position. x - atom[j] . position, x; 
r.y = atom[i] . position, y - atom[j) .position. y; 
r.z = atom[i] .position, z - atom[j] .position. z; 
r2 = vector_length2 (r) ; 
r6 = r2*r2*r2; 

eij = sqrt (atom[i] .p->ei * atomt j ] .p->ei) ; 

rij = 0.5 * (atom[i] .p->ri + atom[j] .p->ri) ; 

rij3 = rij*rij*rij; 

a = eij * rij3*rij3*rij3*rij3; 

b = 2*eij * rij3*rij3; 

term ^ FACT * atom[i] .p->charge * atom[jl .p->charge / (4*r2) 

+ a/{r6*r6) - b/r6; 
e -= 0.5 ♦ term; 

} 

return (e) ; 
#undef fact 

} 

/♦ This routine returns the H-bond energy 

betw en the atoms in *atom and th atoms in ♦twig. 

The atoms in ♦twig must be those directly following those in 

♦atom. 
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V 

doiible d_hbond_energy (hbond_list *1, atom^list ♦atom, vector *twig, 

int n_atoms, int nO, int nl, int n2, int 

n^twig) 

{ 

int i,j,k; 
vector r; 
double r2, e; 
e = 0.0; 

for (; 1; l=l->next) { 

i = l->num[0]; j = l->numll]; 

if ( INTERVAL {i, n_atoms, n_atoms+n_twig) ) { 
k = i; 

i = j ; 
j = k; 

} 

if ( INTER VAL ( j , n_a t oms , n_at oms + n_twi g ) t& 
{ INTERVAL ( i , no , n_a t oms ) | | 

INTERVAL (i, nl, n2) ) ) { 
r.x = atom [i] .posit ion. X - twig[j-n_atoms] .x; 
r.y = atotn[i] . position. y - twig [ j -n_atoms] .y; 
r.2 = atomli] .position. 2 - twig [j -n_at oms] .z; 
r2 = vector_length2 (r) ; 
e += l->p->a / ( r2 * r2 ♦ r2 ♦r2 * r2 *r2 ) 
l->p->b/ (r2*r2*r2*r2*r2) ; 
} 

} 

return (e) ; 

) 

/♦ This routine returns the H-bond energy 
V 

doxible hbond_energy (hbond_list *1, atom_list *atom) 
{ 

vector r; 
double r2, e; 
e = 0.0; 

for (; 1; l=:l->next) { 

r.x = atom[l->numtO] 3 . position. x - atom[l->num[lJ ] . position. x; 
r.y = atom[l->nuTntO] ] . position. y - atom[l->numIl] ] . position. y; 



177 



wo 96/34^9 



PCrAJS96/04229 



r.2 = atOTn[l->num[0] 1 .position. 2 - atom[l->num[l) ] .position. 2; 
r2 e vector_length2 (r) ; 

e += i->p->a / (r2*r2*r2*r2*r2*r2) - l->p->b/ (r2*r2*r2*r2*r2) • 

} 

return (e) ; 

} 

/♦ This routine returns the H-bond energy 

between the atoms in *atom and the atoms in *twig. 

The atoms in ♦twig must be those directly following those in 

♦atom. 

V 

double d_torsion_energy (torsion_list *t, atom_list *atom, vector 
♦twig, 

int n_atoms, int nO, int nl, int n2, int 

n_twig) 
{ 

int io,k,l; 
vector V [4] ; 
double theta, e, tmp; 
e = 0.0; 

for (; t; ts:t->next) 
{ 

if (t->p->vO[0] != 0.0 j| t->p->vO[l] 1= 0.0 II t->p->v0[2) ! = 
0.0) { 

i = t->num[0]; j = t->numll] ; k = t->num[2] ; 1 = t->num[3] ; 
if (INTERVAL (i,n_atoms+n_twig,nl) | | i >= n2 | j i c nO) 
continue; 

if { INTERVAL (j ,n_atoms+n_twig,nl) | | j >= n2 | | j < nO) 
continue; 

if (INTERVAL (k,n_atoms+n_twig,nl) | | k >= n2 jj k c nO) 
continue; 

if (INTERVAL (l,n_atoms+n_twig,nl) | j 1 >= n2 j | 1 < nO) 
continue; 

if {! (INTERVAL {i,n_atoms,n_atoms+n_twig) 
INTERVAL ( j , n_atoms, n_atoms+n_twig) 
INTERVAL (k, n_atoms , n_atoms+n_twig) 
INTERVAL (1 , n_atoms, n_atoms+n_.twig) ) ) continue ; 
/* printf("%d %d %d %d", i, j, k, 1); ♦/ 

if ( INTERVAL ( i , n_atoms , n_atoms+n_twig) ) 
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vtO] = twigIi-n_atoins] ; else v[o] = atom [i] .position ; 
if ( INTERVAL ( j , n_atoms , n_atoms+n_twig) ) 

v[ij = twig[j-n__atoins] ; else vtij = atoTn[j] .position; 
if ( INTERVAL (k, n_atoms , n_atoms+n_twig) ) 

v[2) = twigtk-n_atoms) ; else v[21 = atom [k] .position; 
if ( INTERVAL (1 , n_atoms, n_atoms+n_twig) ) 

v[3] = twig[l-n_atoms] ; else v[3] = atom[l] .position; 
theta = torsion(v(0] , v[lj, v[2] , v[3J); 
tmp = (t->p->vO[0]*{l + cos( theta-t->p->phiOtO] ) ) + 
t->p->vO [1] * (1 + cos (2*theta-t->p->phi0 [1] ) ) + 
t->p->vO [2] * (1 + cos (3*theta-t->p->phi0 12) ) ) ) 

t - >degen ; 

/♦ printfC %lf %lf \n", theta, tmp) ; */ 

e += tmp; 

} 

} 

return (e) ; 

} 

/♦ This routine returns the torsional energy 
*/ 

double torsion_energy(torsion_list *t, atom_list »atom) 
{ 

double theta, e, tmp; 
e = 0.0; 

for (; t; t=t->next) 
{ 

if (t->p->vO[0] != 0.0 |j t->p->vOtl) != 0.0 II t->p->v0t2) ! 
0.0) { 

theta = torsion(atomIt->num[D] ] .position 
atom[t->num[l] ] .position, 

atom[t->num(2]] .position 

atom[t->num[3] ) .position) ; 

tmp = {t->p->v0 [0] * (1 + cos( theta-t->p->phiO [0] ) ) + 
t->p->vO [1] * (1 + cos{2*theta-t->p->phi0[il ) ) + 
t->p->vO [2] * (1 + cos (3*theta-t->p->phi0 [2] ) ) ) 

t->degen; 

/* printf(nd %d %d %d %lf %lf\n", t->num[OJ , t->num[i] 

t->num[2] , 

t->num[3], theta, tmp); */ 
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e ■••= tltip; 

} 

) 

return (e) ; 

} 

MONTE CARLO ROUTINES - PEPTIDES. C 

/* The Monte Carlo routines 

*/ 

^include ''peptide .h" 

/* This routine drives the conf igurational bias Monte Carlo 
*/ 

void do__mc {rigid^unit *unit, torsion_list *t, htoond_list *1, 

atom_list ♦atom, atom_list *atoin2, atom^info *atom_tmp, 

vector *twig[], regrowth *main, regrowth *side, 

int n_amino_acids, int n_atoms_total , int n_main, int 

n_side, 

logical cyclic) 

{ 

int list_num, i, j; 

double logrosen, e, e2, emin; 

vector pO, bO; 

vector vi,v2; 

emin = 1.0E99; 

list_num = 0/ 

pO.x = 0.0; pO.y = 0.0; pO . z = 0.0; 
bO.x = 0.0; bO.y = 0.0; bO.z = l.O; 
e = 0; 

logrosen =0; 
/* create initial geomeotry */ 

do_unit (&list_num, 0, n_atoms_total, n_atoms_total , 
fitlogrcsen, unit, unit, t, 1, atom, twig, 
pO, bO, &e) ; 
/* read in initial geometry ♦/ 

if (0) read_res tart (atom, n_atoms_total) ; 
if (cyclic) 
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read_cycle (t , 1, atom, main, side, twig, n_main, n_side, 
n_atoms_total) ; 
/* 

do_backbone_f (0, n_main, n_atoms_total, &logrosen, main, 
side, t, 1, atom, twig, &e, TRUE) ; 

do_bac)cbone_b (n_main-l, n_main, n_atoms_total , &logrosen, main, 
side, t, 1, atom, twig, &e, TRUE) ; 

do_bac)cbone_f _rigid ( 0 , n__main, n_atoms_total , 

&logrosen, main, 

side, t, 1, atom, atom_tmp, twig, &e, TRUE) ; 
do_backbone_b_rigid (n_main-l, n_main, n_atoms_total , 

&logrosen, main, 

side, t, 1, atom, atom^tn^, twig, See, TRUE) ; 

*/ 

emin = e = energy(t, 1, atom, n_atoms_total) ; 
/* copy old positions into new */ 

for (j*0/ j<n_atoms_total ; j++) atom2[j] = atom[j] ; 
/♦ do Monte Carlo */ 

for (issO; i<16000; i++) { 
printf ("%d\n",i) ; 

rot ate_main (atom, atom2, twig, main, side, t, 1, n_main, 
n_atoms_total , &e) ; 

/* 

regrow_main (t , 1, atom, atom2, atom_tmp, twig, main, side, 

n_main, n_atoms_total , &e) ; 
regrow_side ( t , 1, atom, atom2, twig, main, side, 

n_side, n_atoms_total , &e) ; 

V 

if (e < emin) { 
emin * e; 

write_car_f ile (n_amino_acids , n_atoms_total , atom, 
••min. car" ) ; 

} 

} 

printf ( "emin *lf \n" , emin) ; 

} 

/* This routine reads in a restart file 
*/ 

void read restart (atom list *atom, int n atoms total) 
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{ 

#define LINELEN 200 
FILE *fp; 

int i ; 

char name [30] , line [LINELEN] ; 
strcpy (name, "restart . car" ) ; 
if ( {fp = fopen(name, "r")) == NULL) { 

printf{"Data file %s does not exist\n", name) ; 

exit (1) ; 

} 

f gets (line, LINELEN, fp) 

fgetsdine, LINELEN, fp) 

fgetsdine, LINELEN, fp) 

fgetsdine, LINELEN, fp) 

for (i=0; i<n_atoms_total ; i++) { 
fgetsdine, LINELEN, fp) ; 
sscanfdine, "%s %lf %lf %lf", name, 

tatomlij . position. X, 
&atom[i] .posit ion. y, 
&atom[i) .position. 2) ; 

} 

f close (fp) ; 

} 

/* This routine reads in the backbone units plus one side- chain 
atom 

for the geometry CXXXXXXC. It then adds on each of the side 
groups randomly 

V 

void read_cycle (torsional is t *t, hbond_list *1, 

atom_list ♦atom, regrowth *main, regrowth ♦side, 
vector *twig[l, int n_main, int n_side, int 

n_atoms_total) 

{ 

#define LINELEN 200 
FILE ♦fp; 

int i, j, k, list_num; 
char name [30], line [LINELEN] ; 
double logrosen, ; 
/* read in loop atoms plus one side group atom */ 



182 



WO96/30S49 PCT/US96/04229 

if (n_inain != 2*8+3) { 

printfC'This cyclic geometry is not supported\n") ; 
exit (1) ; 

} 

strcpy (name, "CXSC.car") ; 

if ({fp = fopen(name, "r")) == NULL) { 

printf("Data file %s does not exist\n", name) ; 

exit (1) ; 

} 

f gets (line, LINELEN, fp) ; 
f gets (line, LINELEN, fp) 
f gets (line, LINELEN, fp) ? 
f gets (line, LINELEN, fp) ; 
for (i=0; i<n_main; i++) { 

/♦ printf ("%d\n",main[i) .unit->list_num) ; ♦/ 
for (j=0; j<main [i] .unit->n_atoms ; { 
k = main[i] .unit->list_num + j; 
f gets (line, LINELEN, fp) ; 
sscanfdine, "%s %lf %lf %lf", name, 

&atom[k] . position. X, 
&atom[k] . position. y, 
&atom[k] .position. 2) 
/♦ printf ("%d %s %lf %lf %lf \n" , k, name, 

atom[k] . position. X, 
atom[k] . position. y, 
atom[k) .position. 2) ; 

} 

if (main til •unit- >n_bonds =- 2) { 
k++; 

f gets (line, LINELEN, fp) ; 

sscanfdine, "%s %lf %lf %lf", name, &atom[k] .position. 

&atomtkl . position. y, 
&atom[k] .position. 2) 

/♦ printf ("%d %s Irlf %lf %lf \n" , k,name, 

atom[k] .position.x, 
atomtkj . position. y, 
atom[k] .position. 2) ; 

) 

} 
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f close (fp) ; 
/♦ add on side groups */ 
for (i=0; i<n_side; i++) { 

list_nuin = side [i] .unit- >list_nuin; 

do_unit (&list_num, 0, n_atoms_total , n_atoTns_total , 

tlogrosen, side [i] .unit , side [i] .unit , t, 1, atom, twig, 
get_side_pO (atom, side, i) , get_side_lDO (atom, side, i) , 
&e) ; 

} 

) 

/♦ This routine regrows from a main chain unit onwards 
*/ 

void regrow_main (torsion_list *t, hbond_list *1, 

atom_list ♦atom, acom_list ♦atom2, 
atom_info *atom_tmp, vector *twig[] , 
regrowth *main, regrowth *side, 
int n__main, int n_atoms_total , double *e) 

{ 

logical forward; 

int list_num, i, j, k; 

double logrosenl, logrosen2, x, e2, el; 
/♦ pick main group to start regrowth from ♦/ 

i = n_main*ran2 (1. 0) ; 
/♦ pick direction to regrow ♦/ 

forward = (ran2(l.O) > 0.5); 

printf ("regrowing %s from unit %d\n" , (forward) ? "forward" 
"backward" , i) ; 

list_num - main[i] .unit->list_num; 
/* copy old positions into new ♦/ 

for (j=0; j<n_atoms_total; j++) atoni2 [j J .position 
atom[jl .position; 
/* regrow new peptide ♦/ 

e2 0; 

logrosen2 = 0,0; 
if (forward) 

do_backbone__f_rigid(i, n_main, n_atoms_total , tlogrosen2, main, 

side, t, 1, atom2, atom^tn^), twig, &e2, 

TRUE) ; 
else 
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do_bacWbone_b_rigid(i, n_main, n_atoms_total , &.logrosen2, main, 

side, t, 1, atom2, atom^tmp, twig, &e2, 

TRUE) ; 

e2 = energy(t, 1, atom2, n_atoms_total) ; 
/* get old Rosenbluth weight */ 
list_num := main[i] .unit->list_num; 
el = 0.0; 
logroseni = 0.0; 
if (forward) 

do_bac)cbone_f_rigid (i, n^main, n_atoms_total , fclogrosenl, main, 

side, t, 1, atom, atom_tmp, twig, &el, 

FALSE) ; 
else 

do_bac)cbone_b_rigid (i, n_main, n_atoms_total , tlogrosenl, main, 

side, t, 1, atom, atom_tmp, twig, Stei, 

FALSE) ; 

printf("Wn Wo %lf %lf \n" , logrosen2, logroseni); 
printf("En Eo \lf %lf\n",e2, *e) ; 
/♦ perform acceptance test */ 
X = 1.0; 

if (logroseni > logrosen2) x = exp (logrosen2- logroseni ) ; 
/* accept new configuration ♦/ 
if (ran2{1.0) < x) { 

for j<n_atoms_total; j++) atom[j] .position = 

atom2 Ij] .position; 
♦e = e2; 

printf ("SWAP\n") ; 

} 

} 

/* This routine regrows a side chain 
V 

void regrow_side (torsion_list ♦t, hbond_list ♦I, 

atom_list *atom, atom_list ♦atom2, vector *twig[] , 

regrowth ♦main, regrowth *side, 

int n_side, int n_atoms_total , double *e) 

{ 

int list_num, i, j, Jc, nl; 

double logroseni; logrosen2, x, e2; 

if (n side ^=0 ) return; 
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/* pick main group to start regrowth from ♦/ 
i = n_side*ran2 (1.0) ; 

printf ( "regrowing side chain %d\n**,i); 

list_num = sideli] .unit->list_num; 

logrosen2 = 0.0; 
/♦ copy old positions into new */ 

for (j=0; j<n_atoms_total ; j++) atom2 (j ] .position 
atom[j] .position; 
/* regrow side chain ♦/ 

e2 = 0; 
/* determine nl */ 

n 1 

side [i] .prev->bond [side [i] •prev->n_bonds-"l) •>next->list_num; 
do_unit (fitlist_num, 0, nl, n_atoms_total , 

tlogrosen2, side [i] .unit , side [i] .unit , t, 1, atom2, 

twig, 

get_sidejpO (atom, side, i) , get_side_bO (atom, side, i) , 
te2) / 

e2 = energy (t, 1, atom2, n_atoms_total) ; 
/* get old Rosenbluth weight ♦/ 
list_num = side [i] .unit->list_num; 
logrosenl «= 0.0; 

old_unit (&list_num, 0, nl, n_atoms_total , Llogrosenl, 
side [i] .unit, side [i] .unit , t, 1, atom, twig, 
get_sidej)0 (atom, side, i) , get_side_bO (atom, side, i)); 
printf("Wn Wo %lf %lf \n" , logrosen2 , logrosenl); 
printf("En Eo %lf %lf\n",e2, *e) ; 
/♦ perform acceptance test */ 

X = 1.0; 

if (logrosenl > logrosen2) x ^ exp {logrosen2- logrosenl ) ; 
/* accept new configuration */ 
if (ran2(1.0) < x) { 

for ( j=side ti) .unit->list_num; j<list_nura; j++) 

atom tj] .position = atom2 [j ] .position; 
*e = e2; 

printf ("SWAP\n") ; 

} 

) 
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CONCERTED ROTATION ROUTINES - PEPTIDE6.C 

/* The concerted rotation routines 

*/ 

#include "peptide. h" 
/* global variables ♦/ 
vector 1(8], r [8] ; 
double thetalB], Tnl3] 13) ; 
logical head [81 ; 

/♦ This routine performs a concerted rotation on part of the main 

chain. 

*/ 

void rotate_main (atom_list *atom, atom_list *atom2, vector 

♦twig[], regrowth *main, regrowth *side, 
torsion_list ♦t, hbond_list int n^main, int 

n_atoms_total , doiible *e) 

{ 

double joi jn, logroseno, logrosenn, x, phil, eo, en; 
int no, nn, i, j, il, i2, iO; 
vector q; 

logical valid [4); 

double phi2 [4] , phi3 [4] , phi4 [4] , f [4] ; 
iO = n_main * rah2{l.O); 

printf ("Rotating from position %d\n",iO); 
/* copy atom positions to atom2 */ 

for (i=0; i<n_atoms_total; i++) atom2 [i] .position 
atom[i] .position; 
/* determine theta, r, 1 ♦/ 

get_rot_params (atom, main, iO, n__main) ; 
/* get original jacobian ♦/ 

jo = jaclatom, main, iO, n_main) ; 
/* get constants needed by F5 */ 

F5init (get_main_bO (atom, main, (iO+l) % n_main) , &phil) ; 
/* get original Rosenbluth weight */ 

eo = energy (t, 1, atom, n_atoms_total) ; 
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for {i«il; i<i2; i++) 

atomfi % n_atoTns_total] .position = twiglj] [i % 
n_atoms_totall ; 
} els 
♦e - eo; 

} 

/♦ This routine gets the theta, r, and 1 parameters ♦/ 
void get_rot_params (atom_list *atom, regrowth *main, int iO, 

int n_main) 

{ 

int i? 

vector t, V, v2 ; 
double len; 

rigid_unit ♦unit, *unit2, *unit3; 
/* determine theta ♦/ 
for (i=0; i<8; i++) { 

unit = main[(i+iO) % n_main] .unit ; 
theta fij = vector_dot {unit- >head. axis, 

unit->bond [unit->n_bonds-l] ->tail .axis) 

/ 

vector_length (unit - >head . axis ) ; 
theta[ij = (theta[i] < l.O-EPS) ? acos (theta li] ) : 0.0; } 
/* determine r */ 

for (i=0; i<8; i++) head[i) = TRUE; 

if (fabs(theta[5] ) < EPS) headtSl = FALSE; 

for (i=0; i<8; i++) { 

unit = main[(i+iO) % n_mainl .unit^- 
rli) = atom [unit ->list_num + ((head[i]) ? unit->head. atom_num 

unit ->bpnd [unit- >n_bonds-l] ->tail .atom_num) ] .position; 

} 

/♦ determine 1 */ 

for (i=l; i<8; i++} { 
t .X = r[i] .X - r [i-1] .x; 
t-y = r[il .y - r [i-l] .y; 
t.2 = r[i).z - r[i"l].z; 
len = vector_length(t) ; 

/* if (2.03<len && len <2.05) len = 2.038; 
t = vector scale(t» len); */ 
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lti].x = len; l[i].y = ItD.z = 0.0; 
if ( { (tnain[ (i+iO) % n_tnain] .prev->type == Cunit) 
head[i-l]) || !head[ij) { 
1 [i] .X = vector_dot (t , get_main_bO (atom, main, (i+iO) 
n__main) ) ; 

l[i).y = sqrtden ♦ len - l[i].x ♦ l[i].x); 

} 

} 

/* 

for (i=l; i<8; i++) printf("%d %lf %lf %lf %lf\n",i, theta(i], 

l(i] .X, l[i) .y, ItiJ .2) ; 

for (i=l; i<8; i++) 

printf("%d Hf %lf %lf\n",i. r[i).x, r[i].y, r[i).2); 

V 
} 

/♦ This routine checks the rigid unit theta values 
V 

void check_theta (atom_list *atom, regrowth ♦main, int n_main) 

{ 

int i ; 

vector t, V, v2, r; 

double len, theta ; 

rigid_unit *unit, ♦unitZ, *unit3; 

for (i=0; i<n_main; i++) { 

unit = tnain[i % n_main] .unit; 

unit2 = tnain[i % n_main] .prev; 

unit3 = main[(i+l) % n_main] .unit; 

r = atom [unit- >list_nuTn + unit">head- atom_num] .position; 
t = atomtunit2->list_num + 

unit2->bond [unit2->n_bonds-l] ->tail .atom^num) .position; 
t.x = r.x - t.x; t.y = r.y - t.y; t.z = r.z - t.z; 

printf("%lf %lf vector_length(t) 
vector_length (unit- >head. axis) ) ; 

V = atom[unit3->list_num + unit3->head.atom_num] .position; 
v2 = atom[unit->list_num + 

unit->bond [unit->n_bonds-l] ->tail .atom_num] .position; 
v.x v2.x; v.y -= v2.y; v.z -= v2.z; 

theta = V ctor dot(t, v) / (vector_length (v) *v ctor_length (t) ) 
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theta = (theta < 1.0-EPS) ? acos(theta) : 0.0; 

printf(»%d %lf theta) ; 

theta = vector__dot (unit- >head. axis, 

unit->bond[unit->n_bonds-lJ ->tail .axis) / 
vector_length (unit - >head . axis ) ; 

theta = (theta < 1.0-EPS) ? acos (theta) : 0.0; 

printf ( "%lf \n" , theta) ; 

} 

} 

/* This routine determines the Rosenbluth weight */ 
void get_rot_rosenbluth(atom_list ♦atom, atom_list ♦atom2, 

vector *twig[], regrowth *main, 
torsion_lisc ♦t, hbond_list *1, int iO, 
iJ^t n_inain, int n_atoms_total, int ♦n, 
int *j, double *logrosen, double *e) 

{ 

double phi [MAX_ROOTS] [5] , phil, max. sum, de [MAX_ROOTS] , ftmp; 
int i, k, kl, k2; 
/* get phiO-phil solutions */ 
get_phii{phi, n) ; 
if (♦n == 0) return; 
if (*n > MAX_ROOTS) { 

printf ("too many roots\n"); 

•n = 0; 

return; 

} 

/* determine energies of solutions ♦/ 
max = -1E99; 
for (i=0; i<*n; i++) { 
get_r(phi(i] [1] , phitiJt2], phi(iJt3], phiri]l4]); 
do_rotation(atom, twig[i], main, io, n_main, n_atoms_total) ; 
kl = main[i0] .unit->list_num; 
k2 = main((i0+7) % n_main] .unit->list_num; 
if (k2 < kl) k2 += n_atoms_total ; 
for (k«kl; k<Jc2; k++) 

atom2[k % n_atoms_total] .position = twig[i] [k % 
n_atoms_totall ; 

deti] = -BETA*energy(t, 1, atom2, n_atoms_total) ; 
if (de[i] > max) max = de[i]; 
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} 

sum = 0.0; 

for (i=0; i<:*n; i++) { 

de[i] = exp(de[i] - max) ; 
sum de [ij ; 

} 

♦logrosen = log (sum) + max; 
/♦ pick winner ♦/ 
/* Doros move */ 

/♦ *j = ♦n*ran2 (1.0) ; */ 
/* CBMC move */ 

delO] /= sum; 

for (i=l; i<*n; i++) de[i] = de[i-l] + de[i]/sum; 
f tmp = ran2 (1.0) ; 

for (*j=0; *j<:*n; (*j)^.+ ) if (ftmp <= de[*j]) break; 
/* get energy of winner ♦/ 
ftmp = de [♦j] ; 

if (*j > 0) ftmp det*j-l]; 
' ftmp sum; 
*e = - (log (f tup) +max) /BETA; 
/* assign r to the winner */ 

get_r(phi[*j] [IJ , phi[*j][2], phil*j]t3), phit*j][4]); 

} 

/♦ This routine calculates the jacobian 
V 

double jac (atom_list ♦atom, regrowth *main, int iO, int n_main) 
{ 

int i; 

vector u[7], h[6] , t, v; 
double b[5] [53 ; 

/♦ form ui and hi */ 

for (i=l; i<7; i-^-*-) u[il = get_main_bO (atom, main, (iO + ii 
%n_main) ; 

for (i=l; icS; i++) h[i] = rti]; 

hl5] s atom[main I (iO+5) %n_main] •unit->list_num + 

main [ (iO+5) %n_main] .unit->head.atom_num) .position; 

v.x = r[6]*x - hl5] .X; v,y = r[6l .y - h[5] .y; 

v. 2 = r[6] .2 - h[5] ,2; 
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V = vector_scale (V, 1.0); 
/* form B matrix */ 
for (i=l; i<6; i^+) { 

t.x = r [51 .X " h[i] .x; 

t.y = r [5] .y - h[i) ,y; 

t .2 = r 15] .z - h[i] .z; 

t = vector_cross (u [i] , t) ; 

blO] = t.x; 

b{l] = t,y; 

bl2] [i-13 = t .2; 

} 

for (i=l; i<6; i**) { 

t = vector^cross (u [i] , ut6]); 
bt3] = t.x; 

b[4] [i-1] » t .y; 

} 

return {1.0/fabs (detS (b) ) ) ; 

} 

/* Tliis routine rotates phiO to change r[l) . 
It returns the new hO for unit iO+l. 

vector rotate^rl (atoTn_list ^atom, regrowth *main, int iO, int 

n^main) 

{ 

double c, s, y; 
vector X, n; 
/* choose delta phiO */ 

y = DPHI * (l-2*ran2 (1.0) ) ; 
c = cos (y) ; 
s = sin(y) ; 

n ~ get_main_bO (atom, main, iO) ; 
/♦ rotate about axis */ 
X - r[lj ; 
X-X r{OJ .X; 
x.y -= r(0] .y; 
x.z -= r[01 .2; 

X s vector_rotate (x, n, c, s) ; 
r[l] ,x = r[0j .X + x.x; 
r [1] .y s= r[OJ ,y + x.y; 
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r [1] ,2 = r [0] .2 + X.Z; 

/* compute new bO for unit iO+l */ 

return (vector_rotate (get_main_bO (atom, main, (iO+1) % n_main) , 
n, c, s) ) ; 

} 

This routine constructs r2-r4 from the theta, phi 
information ♦/ 

void get_r (double phil, double phi2, double phi3, doxible phi4) 

{ 

int i ; 

vector X, y; 

/* 

printf {"\n") ; 

printf(nif %lf %lf %lf %lf\n", phil, phi2, phi3, phi4) ; 

*/ 

X = bxm (m, 1 [1] ) ; 
r [1] ,x = x.x + r [0] ,x; 
rll] .y = x.y + rIO] .y,- 
r [1] .z = X.2 + r [0] .2; 

X = bxm(m, f lory_rot ( theta [1] , phil, 1[2])); 
r(2] .X = x.x + r[l] ,x; 
r [2] .y = x.y + r [1] .y; 
r [2] . 2 = X.Z + r [1] , 2; 

X = bxm(m, f lory_rot (theta [1] , phil, f lory_rot { theta [2] , phi2, 
1[3]))); 

r[3).x = x.x + rI2).x; 
r [3] .y x.y + r [2] .y; 
r [3] ,2 = x.2 + rl2] .2; 

X = bxm(m, f lory_rot (theta [1] , phil, flory_rot (theta [2] , 

phi2, flory_rot (thetat3] , phi3, lt4])))); 
r[4].x =s x.x + r[3].x; 
r [4] ,y = x.y + r [3] .y; 
r [4) .2 = x.2 + r (3] .2; 

/* 

for (i«l; i<7; i+^-) 

printf ("%d %lf %lf %lf\n",i, r[i).x, r[il .y, r[ij.2); 

*/ 

} 

/* This routine rotates the rigid units to the positions 
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of the concerted rotation. 

*/ 

void do_rotation(atom_list ♦atom, vector ♦twig, regrowth ♦main, 

int iO, int n_main, int n_atoms_total) 

{ 

int i, j, ilr i2, i3, j2; 

double mt3] [3] , ar3][31, tmp, len2; 

vector XI, x2, yl, y2, x; 

rigid_unit ♦unit; 

for (i=-l; i<6; i++) { 

11 = (i+iO+n_main) % n_main; 

12 = (i+iO+1) % n_Tnain; 

13 = (i+iO+2) % n_main; 
/♦ get xl & x2 ♦/ 

xl = r [i+1] ; 
x = {i > -1) ? 

twig [main [il] .unit->bond (main [il] .unit->n_bonds-l) ->tail .atom_num+ 
main[il] .unit->list_num] : 

atom [main [il] .unit ->bond [main [il] .unit->n_bonds-lJ ->tail . atom_num+ 
main[il] .unit->list_num] .position; 
xl.x -= x,x; xl.y -= x.y; xl.z X.Z; 
x2 = atom{main[i2] .unit->list_nura + ({head[i+l]) ? 
main [i2l .unit- >head.atom_num : 

main[i2] , unit ->bond [main [i2] .unit->n_bonds-l] ->tail .atom_num) ] 
.position; 

X = 
atom[main[il] . unit ->bortd [main [il] .unit->n_bonds-lJ ->tail .atom_num 
+ 

main [ill .unit->list_num] .position; 

x2.x -= x.x; x2.y X.y; x2.2 -= x.2; 
/♦ get rotation matrix ♦/ 

flory_lcLb{a, xl, x2) ; 
/♦ get yl & y2 ♦/ 

yl = r[i+2] ; 

X = (i > -1) ? 
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twig [main [il] .unit- >bond [main [il] .unit->n_bonds.l] ->tail , atom^num^ 
main[ii] .unit->list_num] : 

atom[main[ill , unit ->bond [main [il] .unit->n__bonds-l) ->tail .atom__num+ 
main[il] .unit->list_num] .position; 
yl.x -= x.x; yl.y -= x.y; yl . z •= x.z; 
y2 = atom[main[i3j .unit->list_num + {(head[i+2]) ? 
main[i3] .unit->head,atom_num : 

main[i3] . unit ">bond [main [i3] .unit->n_bonds-l] - >tail . atom^num) ] 
.position; 



om num 



atom[main[il] . unit ->bond [main [il] .unit->n_bonds-i] ->tail .at 

main[il] .unit->list_num] .position; 

y2.x -= x.x; y2.y •= x.y; y2 , 2 x.z; 

y2 = mxb(a, y2) ; 
/* get projection */ 

len2 = vector_length2 (xl) ; 

tmp = vector_dot (y2 , xl) / len2; 

y2 . x - = xl . X * tmp ; 

y2 . y - = xl . y * txnp ; 

y2,2 -= xl.z ♦ tmp; 

tmp = vector_dot (yl, xl) / len2; 

yl.x xl.x ♦ tmp; 

yl.y -= xl.y * tmp; 

yl.z -= xl.z * tmp, 
/* get rotation matrix ♦/ 

flory_lab(m, yl, y2); 

mxm (m, a) ; 
/♦ perform rotation ♦/ 

X I 

atom(main[ii] . unit- >bond [main [il] .unit->n_bonds-l] .>tail.atom_num^ 
main[il] .unit->list_num] .position; 
x2 = (i > -1) ? 

twig[main[il] . unit- >bond [main [il] .unit.>n_bonds-l] ->tail.atom_num+ 
main [ill .unit->list_numl : xl; 
j2 = main[i3] ,\init->list_num; 
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if (13 == 0) j2 = n_atoms_total ; 

for (j=main[i2] .unit->list_nuin; j < j2; j++) { 

X = atOTn[j] .position; 

x.x -= xl.x; 

x.y -= XI. y; 

X.Z -= Xl.Z; 

X = mxb (m, x) ; 
x.x += X2.X; 
x.y += x2.y; 

X.Z += X2.Z; 

twig[j] = X; 

} 

} 

) 

/* This routine determines the phii-phi3 values 
V 

void get_phil (double phi tMAX_ROOTS] [5] int ♦n) 
{ 

#define NTRY 10000 
int i, j; 

logical valid [NTRY+i] [4] ; 

double Phil tNTRY^l] , phi2 [4] . phi3 [4] , phi4 [4] ; 
double f (NTRY+l] [4] ; 
*n = 0; 
i = 0; 
/* Evaluate F5 */ 

for (i»0; i<=NTRY; i++) { 

phil[i] = -PI + i*2*PI/NTRY; 
^ F5{phil[i], phi2, phi3, phi4, f[i], valid [i]); 

/♦ Now search for roots */ 
for (i=0; i<NTRY; i++) { 
for {j=0; j<4; j++) { 

if dvalidU] [j] Jl !valid[i+i] [j]) continue; 
if ((f[i][j] < 0 && f[i+i]lj] > 0) jj 
(ffiJtj] > 0 && f[i+l]Ij] < 0)) { 
if (♦n >» MAX_ROOTS) ( 

printfCExc ssive number of roots failure 
get_phii\n") ; 
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return; 

} 

get_root (phil [i] , phil [i+i] , &phi [*n] [1] , tphi [*n] [2] , 
&phi [♦n] [3] , &phi [*n] [4) , j) ; 

(*n)++; 

} 

} 

} 

#undef NTRY 
} 

/* This routine refines a root using bisection 
*/ 

void get_root (double xO, double xl, double *pi, double •p2, 

double *p3, double *p4 , int n) 

{ 

logical valid [4] ; 

double phi2 [4] , phi3 [4] , phi4 [4] , f (4J 
/♦ order roots: f (xO) < 0 t& f(xl) > o •/ 
F5(xl, phi2, phi3, phi4, f, valid); 
if (ffn] < 0.0) { 
*pl = xO; 
xO = Xl; 

XI = *pl; 

} 

/* do bisection to refine root */ 
do { 

♦pi = 0.5* (xl+xO) ; 

F5(*pi, phi2, phi3, phi4, f, valid); 

if (f [n] > 0) Xl = *pi; else xO = *pi; 
} while (fabs(xl-xO) > EPS); 
*p2 = phi2tn] 
♦p3 = phi3tn] 
*P4 = phi4tn] 

} 

/* constants */ 

double clO, cll, C12, qi2. C20. C21, C22, factl. fact2; 
vector xO, u60; 

/♦ This routine sets up constants that F5 uses. 
The constants are independent of phil 
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♦/ 

void PBinit (vector q2, double ♦phii) 

int i,j; 
vector t; 

double ci, c2, a [3] [3] , tmp; 
t.x = 1.0; t.y = t.z = 0.0; 
flory_labinv(in, g2, t) ; 

t = tnxb(m, t) ; 

if (fabs(t.y) < EPS && fabs(t.2) < EPS) { 

Cl = 1.0; 

C2 = 0.0; 
} else { 

Cl = {1[1] .y*t.y + t.znil] .2)/(t.y*t.y + t.z*t z) - 
C2 = (-![!]. z*t.y . t.2n[l].y)/(t.y*t.y . t.z*t.2)'- 
^ If (fabs(cl) < EPS && fabs(c2) < EPS) cl = l.O; 

a[0] [0] = 1; a[0] [1] = 0; a[0) [2] = O; 
all] 10) = 0; afl][i] = Cl; all] [2] = c2; 
al2J ro] = 0; a[2J UJ = -c2; al2J [2] = cl; 
inxm(a, m) ; 
for (i=0; i<3; i++) 

for (j=0; j<3; j++) 

Tn[i] [j] = a[i] [j] ; 

rm" r " ' "''' y - -'^i-y.- = rt2).z - 

t = nDcb(in, t) ; 

tnp = (sin(theta[l])*it2J.x - cos (theta [i] ) *i [2] .y) ; 
*phxi = atan2(t.z/tmp. t.y/tmp); 

xO = rnxb(m, xO) ; 

XO.X -= 1[1] .x; 

xo.y -= i[ij .y. 
xo.z -= .2. 

if (fabs(thetat5]) < EPS t& fabs (theta [3] ) < BPS) ( 
CIO = 1[3] .x*cos(theta[4] ) ; 
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cll = - (cos(theta[2] )*lt3] .X + sin (theta [2] ) *1 [3] .y) ; 
tmp = sin(theta[2] ) *1 [3] .X - cos ( theta [2] ) ♦! [3] .y ; 
clO /= tnrp; 
cll /= tinp; 

} else if (fabs (theta [51 ) < EPS f abs (theta(3] ) > EPS) { 

clO = -l[51.x - 1 [4] .x*cos{theta{4] ) ; 

cll = "(cos(theta[2))*lt3] .X + sin (theta [2] ) ♦! 13] .y) ; 

cl2 = 1.0/ (sin(theta[2] ) *1 (3) .X - cos (theta f2] ) *1 [3] .y) ; 
} else if (fabs (theta [31 ) > EPS) { 

t.Z = 0.0; 

t.x = 1 [4] .x*cos (thetal41 ) - 1 [4] .y*sin (theta [4] } + l[5].x; 
t,y := 114] .x*sin(thetal4l ) ^ 1 [4] .y*cos (theta [4] ) + l[5].y; 
ql2 = vector_length2 (t) ; 
ClO = ql2 • vector_length2 (1 [3) ) ; 

cll = 2* (cos (theta[21 ) *1 [3] .x + sin (theta [2] ) *1 (3] .y) ; 
cl2 ^ -1.0/(2*{sin(theta[2] )*1[3] .X - cos ( theta [2] ) *1 [3] .y) ) ; 
} else { 

clO = 113] .X + l[4).x + 1 [5] .x*cos (theta[4] ) ; 
cll rr -cos (theta [21 ) ; 
tmp sin (theta [2] ) ; 
clO /- tmp; 
cll /= tmp; 

} 

c20 = vector_length2 {1 [5J ) - vector_length2 (1 [4] ) ; 
c21 = 2*(cos{theta[3] )*lt4] .X + sin (theta [3J ) ♦! (4] .y) ; 
c22 = -1.0/(2* (sin(theta[3] )*114] .X - cos (theta [3) ) ♦! [4] .y) ) ; 
factl = sin(theta[4))*ll5) .X - cos (theta [4] ) *1 15] .y; 
fact2 = 1[6) .x*cos(thetal51 ) + 1 [6J .y*siii(theta (5] ) ; 
uSO.x = r(63 .X - r(5l .x; u60.y = r(61 .y - r[51 .y; u60.2 = rt6] .z 
- r[5] .2; 

} 

/* This routine returns the F5 function of Doros. 
♦n is the number of solutions, which are in f . 

*/ 

void F5(doiible phil, double phi2l4], double phi3 14] , double 
phi4[4], double ft4], logical validU]) 

{ 

int i, j; 

double txtsp, cl, c2; 



200 



wo 96/30849 FCTfVS96m229 
vector vl, ql, q2, x, y, t, u6; 

double a [3] [3] , rotl[3][3], rot2[3][3], rot3[3][3], rot4[3][3] 
/♦ determine cl */ 

valid 10] = valid [1] ^ valid [2] = valid 13] = FALSE; 
flory_rot_matrix(theta[l] , phil, rotl) ; 
X =s bxm(rotl, xO) ; 

x.x -= l[2l.x/ x,y -= l[2],y; tc.z -= 112]. 2; 
vl = X; 

if (fabs(theta[5] ) < EPS f abs (theta [3] ) < EPS) { 

X = bxm(rotl, mxb(tn, vector_scale (u60, 1.0))); 

cl = (clO + x.x^cll) / sqrt(x.y*x,y + x.2*x.z); 
} else if (fabs(thetal5] ) < EPS &fic f abs (theta [3] ) > EPS) { 

X = bxm(m, flory^rot (theta [1] , phil, 1[2])); 

r[2l.x = x.x + r[ll,x; r[2l.y = x,y + r[ll.y; r[2].2 = x.2 
rll) .2; 

t.x ^ r[5].x ' r[2].x; t.y = r[51.y - r[2) .y; t,2 = r[5],2 
r(2],z; 

X = bxm{rotl, tnxb(m, vector_scale (u60 , 1 . 0) ) ) ; 
cl = cl2*(cl0 + vector_dot (t, 

u60) /vector_length{u60) + x.x*cll) / sqrt(x.y*x.y 

X, z*x. z) ; 

} else if (fabs(theta[3}) > EPS) { 

cl * cl2*(cl0 - vector_length2 (x) + x,x*cll) / sqrt(x.y*x.y 

X. 2*X,2) ; 

} else { 

cl = (clO + x.x*cll) / sqrt(x.y*x.y + x,2*x.2); 

} 

/♦ printf("cl %lf\n",cl); ♦/ 
if (fabs(cl) > 1) return; 
/♦ determine phi2 ♦/ 
tmp = asin(cl) ; 

phi2l0] =phi2I2) = -atan (x.y/x. z) ; 
if (x.2 < 0) phi2[0] = phi2[2] = phi2 [0] - PI; 
phi2l0l tn5>; 
phi2t2] PI - tmp; 
phi2[l] = phi2[0] ; 
phi2[3] := phi2[2) ; 
X = vl; 
/* determine c2 and phi3 ♦/ 
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for (i=0; i<2; i++) { 

y = flory_rotinv{thetat2] , phi2 [2*i] , x) ; 
y.x -= l[3].x; y.y -= l[3].y/ y.z -= 1[3].Z; 
c2 = c22*(c20 - vector_length2 (y) + y.x*c2l) / sgrt(y.y*y.y 
y-2*y.z) ; 

/♦ printf{''c2 %lf\n",c2); */ 
if (fabs(c2) <= 1) { 
tmp = asin(c2) ; 

phi3[2*i} = phi3[2*i+l] = -atan (y .y/y . 2) ; 

if (y.z < 0) phi3[2*i] = phi3 I2*i+l] = phi3 [2*i + l] - Pi; 

phi3[2*i] += tmp; 

phi3[2*i+i] += PI - tmp; 

valid [2*i) = valid [2*i+l] = TRUE; 

) 

} 

for (i=0; i<4; i++) { 

if {!valid[i]) continue; 
/♦ determine r4 */ 

flory_rot_matrix(theta[2J , phi2 [i] , rot2) ; 
flory_rot_matrix(theta[3] , phi3 [i] , rot3) ; 
X = mxb(rot3, 1 [4] ) ; 

x.x += l[3).x; x.y ■»■= 1(3] .y; X.Z += 1[3].2; 
X = mxb (rot2, x) ; 

X.X += l[2J.x; x.y += l[2].y; x.z += 1[2].2; 
X = mxb ( rot 1, x) ; 

x.x += l[l].x; x.y += lll].y; x.z += 1[1].2; 
X = bxm(m, x) ; 

x.x ••-= rlO).x; x.y +» r[0].y; x.z += r[0].2; 
/♦ determine F5 */ 

if (fabs (theta[5] ) < EPS && fabs (theta [3] ) < EPS) { 

VI. X = r[6].x - x.x; VI. y = rl6].y - x.y; vl . z = r[6].z - 

X.Z; 

fti] = sqrt((l[6] .x+115] .x)*(l[6) .x+lt5] .X) 

1 [5] .y*l [5] .y) - vector_length(vi) ; 
} else if (fabs (theta [5]) < EPS && fabs (theta [3] ) > EPS) { 
X = bxm(m, mxblrotl, mxb(rot2, mxb(rot3, 1[4])))); 
f (i] = vector_dot (x, u60) / 

(vector_length (x) ♦vector_length (u60) ) - cos (theta [4] ) ; 

} else { 
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x.x = r[5] ,x - x.x; x.y = r[5).y - x.y; x.z = r[5).2 - x.z; 
X = Tnxbdn, x) ; 

X « bxm(rot3, bxm(rot2, bxm(rotl, x) ) ) ; 
phi4[i) = atan2 (x.z/factl, x.y/factl) ; 
u6 = mxbdn, u60) ; 
x.x = 1.0; x.y = 0; X.2 = 0; 

f [i] - vector_dot {u6, Tnxb(rotl, Tnxb{rot2, Tnxb(rot3, 

flory^rot (theta [4] , phi4 [i] , x) ) ) ) ) 

f act2 ; 

} 

} 

} 

GEOMETRY/ROTATION ROXTTINES - PEPTIDE? . C 

/* The geometry* routines 

*/ 

# include " pep t i de , h " 

/♦ This routine rotates the vector a about n by theta 
(counterclockwise is +) 
r' = r cos(theta) + n (n .r) ( l-cos (theta) ) + nxr sin(theta) 

*/ 

vector vector_rotate {vector a, vector n, double cos_theta, double 
sin_theta) 

{ 

double fact; 
vector ret, v; 

fact = (n.x*a,x + n.y*a.y + n.z*a,2) ♦ (1.0 - cos_theta) ; 
V = vector_cross (n,a) ; 

ret.x = a.x*cos_theta + n.x*fact + v.x*sin_theta; 
ret.y = a.y*cos_theta + n.y*fact + v.y*sin_theta; 
ret. 2 = a. 2*cos_theta + n.2*fact + v. z*sin_thBta; 
return (ret) ; 

} 

/* This routine returns main- chain bO 

i=0 noncyclic case should never happen- -it won't be right 

V 
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vector get_main_bO (atoTn_list ♦atom, regrowth ♦main, int i) 

{ 

vector X, y; 

if (main[i] .prev == NULL) { 
x.x = x.y = 0.0; 
X.2 = 1.0/ 
return (x) ; 

} 

X = atom[mainli].\init->list_num + 
main[i] ,unit->head.atom_numl .position; 

y 

atom[main[i] . prev- >bond [main [il .prev->n_bonds-ll ->tail . atom^num + 
main I i) .prev->list_num] .position; 
.x.x -= y.x; 
x,y -= y.y; 

x.2 -= y.Z; 

return ( vector_scale (x , 1.0)); 

} 

/* This routine returns main- chain pO 

i=0 noncyclic case should never happen-*it won't be right 

*/ 

vector get_main_j>0 (atom_list *atom, regrowth ♦main, int i) 

{ 

vector X; 

if (main[i] .prev NULL) { 
x.x = x.y = x.2 = 0.0; 
return (x) ; 

} 

X 

atomImain[i] . prev- >bond [main [il .prev->n_bonds-l] ->tail .atom_num + 
main[i] .prev->list_num] .position; 
return (x) ; 

} 

/* This routine returns side-chain bO ♦/ 

vector get_side_bO {atom_list ♦atom, regrowth ♦side, int i) 

{ 

vector X, y; 

x * atom[side[i].unit->list_num + 
side[i] .unit->head.atom_num] .position; 
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y = atom[Bide[i) .prev->list_num 
side[i] .prev->head.atom_nuTn] .position; 
x.x -= y.x; 
x.y -= y.y; 
X.2 -= y.z; 

return (vector_Bcale (x, 1.0) ) ; 

} 

/♦ This routine returns side- chain pO ♦/ 

vector get_side_pO {atom_liBt *atom, regrowth ♦side, int i) 

{ 

vector x; 

x ^ atom(side(i) .prev->list_num 
side[i] .prev->head.atom_num] .position; 
return (x) ; 

} 

/* This routine gives the Flory rotation matrix 
*/ 

void flory_rot_Tnatrix (double theta, double phi, double Tnl31 13]) 

{ 

double cost, sint, cosp, sinp; 

cost = cos(theta); sint = sin(theta) ; 

cosp = cos (phi); sinp = sin (phi); 

tn[0] [0] = cost; 

m[0] II] := sint; 

m[0] 12] = 0.0; 

mtl) [0] = sint*cosp; 

m[ll [1] = -cost*cosp; 

mil] [2] = sinp; 

tn[2] [0] = sint*sinp; 

n\[2] [1] = 'Cost*sinp; 

m[2] [2] = -cosp; 

} 

/♦ This routine does the Flory rotation 
*/ 

vector flory_rot (double theta, double phi, vector a) 

{ 

vector t; 

double cost, sint, cosp, sinp, tnp; 
cost = cos (theta); sint = sin (theta); 
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cosp = COS (phi); sinp = sin (phi ) ; 
tmp = sint*a.x - cost*a.y; 
t.x = cost*a.x + sint*a-y; 
t.y = cosp^tn^) + Binp*a.z; 
t.2 = sinp*tmp - cosp*a,2; 
return (t) ; 

} 

/* This routine does the inverse Flory rotation 
*/ 

vector flory^rotinv (double theta, double phi, vector a) 

{ 

vector t; 

double cost, «;int, cosp, sinp, tmp; 
cost = cos{theta); sint = sin(theta); 
cosp = cos{phi); sinp = sin (phi ) ; 
tmp = cosp*a.y + sinp*a.2; 
t.x = cost*a.x + sint*tmp; 
t.y = sint*a,x - cost*tmp; 
t.z = sinp*a.y - cosp*a.2; 
return (t) ; 

} 

/♦ This routine constructs the lab transformation to go from 1 

r 

*/ 

void flory_lab (double m[3] t3] , vector r, vector 1) 
{ 

double sin_theta, cos_theta; 
vector n; 

r = vector_scale (r, 1.0); 
1 = vector_scale (1, 1-0); 
n = vector_cross (1, r) ; 
cos_theta = vector_dot (1, r) ; 
sin_theta = vector_length (n) ; 
if (sin_theta < EPS) { 

n,x = 1.0; 
} else { 

n,x /= sin_theta; 

n.y /= sin_theta; 

n,2 /= sin theta; 
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} 

mlO] to] = cos_theta + 

m[0] tl] = 

mIO) [2] = 

mtl] [0) = 

m[ll [ij = cos_theta + 

tn[l] [2) = 

mr2] [0] = 

m[2] [1] = 

tn[2] [2) = cos_theta + 

} 

/* This routine constructs the inverse lab transformation 
void flory_labinv (double m[3] [3], vector r, vector 1) 

{ 

double sin_theta, cos_theta; 
vector n; 

r = vector_scale (r, 1.0); 
1 = vector_scale (1 , 1.0); 
n = vector_cross (1 r r) ; 
cos_theta = vector_dot (1 , r) ; 
sin_theta = vector_length (n) ; 
if (sin_theta < EPG) { 

n . X = 1,0; 
} else { 

n.x /- sin_theta 

n.y /= sin_theta 

n.2 /= sin_theta 

} 

tn[0] [0] = cos_theta + n.x*n.x* (1.0-cos_theta) 

tntU [0] = n.x*n.y* (1. 0-cos_theta) - sin_theta*n. z 

Tnl2) [Oj = n.x*n. z* (1 . 0-cos_theta) + sin_theta*n . y 

Tn[0] [1] = n.y*n.x* (1. 0-cos_theta) + sin_theta*n. z 

[1] = cos_theta + n,y*n.y* (1.0-cos_theta) 

mI21 [1] = n.y*n,2* (l,0-cos_theta) - sin_theta*n.x 

m[0] [2] = n.2*n.x* (1.0-cos_theta) - sin_theta*n, y 

m[l] (21 = n.z*n,y* (1.0-cos_theta) + sin_theta*n.x 

tn[2] [2] = cos^th ta + n.2*n,2* (l.G-cos_theta) 

) 
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/♦ This routine returns a vector cross product 
V 

vector vector_cross (vector a, vector b) 

{ 

vector ret; 

ret,x = a.y*b.z - a.2*b.y; 
ret.y = a.2*b.x - a.x*b.2; 
ret, 2 s= a.x*b.y - a.y*b.x; 
return (ret) ; 

) 

/* This function scales the vector v so that jv{ = r 
*/ 

vector vector_scale (vector v, double r) 

{ 

double ftmp; 

ftmp = sqrt(v.x*v.x + v.y*v.y + v.2*y.2); 
v.x r/ftmp; 
v.y *= r/ftmp; 
V.2 *= r/ftmp ; 
return (v) ; 

) 

/* This routine returns mxn in m 
V 

void mxm{double m[3] [3], double n[3] [3]) 

( 

int i,j,k; 
double a [31 [3] ; 
for (i=0,- i<3; i++) 

for {j=0; j<3; j++) { 
ati] [j] = 0.0; 

for {k=0; k<3; k++) ali] [j] rali] (k)*n[k] tj] ; 

} 

for (i=0; i<3; i*+) 
for (j=0; j<3; j++) 
tnli] [jj = a{il [jl ; 

} 

/* This routine detums det(m), where m is 5x5 
*/ 

double det5(do\ibl tntS][5J) 
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{ 

int i,j,k; 

double a [5] [5] , fact; 
for (i=0; i<5; i++) 
for (j=0; j<5; j++) 
a(i] [j] = in[i] [j] ; 
for (i=0; i<4; i++) { 
for (k=i+l; kcB; k++) { 
fact = a[k] [i] / a[i] [i] ; 

for j<:5; j++) a[k)[j] -= fact*ati] tj] ; 

} 

} 

return (a [0] [0) *a[l] ti] *a[2] 12] *a[3] 13] ♦a [4] [4) ) ; 

) 

/♦ This routine returns det (in) , where m is 3x3 
*/ 

double det(double m[3] [3]) 
{ 

return (m[0] [0]*mtl] [1] *m[2] [2] + m[03 [1] ♦m[l] [2J ♦ml2] [0] + 
m(0] (2] ♦m[l] tO] •in[2] [1] - m [2] (0) ♦m 11] [IJ ♦m [0] [2] - 
mfl) [OI*m[0] [l]»m[2] [2] - m[0] [0)*m(2] [lj*mll] [2] ) ; 

} 

/* This routine returns Mb 
V 

vector TOcb(double mt3] [3] , vector b) 
{ 

vector t; 



t.x = mtO] I0]*b.x + m[0] [lj*b.y + m[0] [2]*b.z; 
t.y = m[l] [0)*b.x + mil] tlj*b.y + mil] [2]*b.2; 
t.z = tn[2] [0]*b.x + in[2] [l)*b.y + ml2] [2]*b.2; 
return (t) ; 

} 

/* This routine returns Mb 
*/ 

vector bxm(double m[3] 13] , vector b) 
{ 

vector t; 
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t.x = into] (0]*b.x + in[l) I0]*b.y + mt2] [0]*b.z; 
t.y = mto] [l]*b.x + in(l] Il]*b.y + m[2] [l]*b.z; 
t.z = tn[0) I2]*b.x + Tn[l) I2]*b.y + m[2] t2]*b.2; 
return (t) ; 

} 

/♦ This routine returns bl.b2 
V 

double vector_dot (vector bl, vector b2) 

{ 

return (bl .x*b2 .X + bl.y*b2.y + bl.z*b2.z); 

} 

/* This routine returns |vj 
double vector^length {vector v) 

{ 

return{sqrt (v.x^v.x + v.y*v.y + v.z*v.2))/ 

} 

This routine returns jv}^2 

*/ 

double vector_length2 (vector v) 

{ 

return (v. x*v.x + v.y*v.y + v.2*v.z); 

} 

RANDOM NUMBER GENERATOR - RANDOM. C 

This is the pseudo- random number library, 

#include <time,h> 
/* 

This function returns a random number in [0,1) . 
It uses a linear- congruent ial method. 

ran (0.0) initializes the random number seed with a time dependant 
valu 

and returns the value of the s ed that the generator 
recognizes . 
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ran (1.0) returns the next number in the random sequence. 
Other arguments initialize the seed with the user-supplied value. 
Initializing the generator with a seed from the sequence, will 
cause the 

subsequent rand.O) to generate the next value of the sequence. 
This is usefull, for example, to shut down and start up the 
generator 

without a loss of continuity in the sequence. 
Values r 1 or < 0 are not recommended. 
It has a period of M. 

double ran (double dummy) 

{ 

static long int ix; 

double rm = 566927.0, rm2 « 1.0/rm; 

long int k = 5701, j = 3621, m = 566927, tmp; 

/♦ make sure parameters not too far off ♦/ 
if (dummy > 2,0) dummy = 2.0; 
if (dummy < -2.0) dummy = -2.0; 
if (dummy != 1.0) 

{ 

if ( (tmp = dummy* rm) < m) 

ix s= tmp; 
else 

ix = m-1; 
if (ix < 0) 

ix = 0; 

} else 

ix = {j*ix + k) * TO; 
return ( ix * rm2 ) ; 

} 

This function returns a pseudo-random number in (0,1) . 

This is a more robust pseudo- random number generator than a 
sin5>le linear- 

congruential gererator is. 

It uses three linear congruent ial generators to get one random 
number . 

ran2(0.0) initializes the generator with time -dependent values. 
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ran2(i.O) returns a pseudo- random number. 
Other arguments are used as an initializing seed. 
Arguments r 1 or s 0 are ill-advised. 
It has a period of (ml-1) (m2-l) (m3-l) /4 . 

*/ 

double ran2 (double dummy) 

{ 

double £1=1.0/30269.0 , f 2=1 . 0/30307 , 0 , f 3=1 . 0/30323 . 0 , tmp; 
int Tnl=30269, m2=30307, m3=30323, seed, itmp; 
static x,y, z; 

/* make sure parameters not too far off */ 
if (dummy > l.i) dummy = l,i; 
if (dummy < -l.l) dummy = -l.l; 
if (dummy i= 1.0) 

{ 

/* initialize with user's seed value */ 
if ((itmp = dummy *mi) < ml) 

seed = itmp; 
else 

seed = ml-l; 
if (seed < i) seed = i; 

/* initialize first generator */ 

X = seed; 

/* initialize second generator */ 
y = 172 * (x % 176) - 35 * (x/176); 
if (y < 0) y += m2; 

/♦ initialize third generator */ 
2 = 170 ♦ (y % 178) - 63 * (y/178) ; 
if (2 < 0) z m3; 

} 

/* first generator ♦/ 

X = 171 * (X % 177) - 2 ♦ (x/177); 
if (X < 0) X -f=r na; 

/* second generator */ 
y = 172 * (y % 176) - 35 * (y/176) ; 
if {y c 0) y m2; 

/* third generator */ 
2 - 170 * (z % 17fl) . 63 ^ (2/178); 
if (2 < 0) 2 += m3; 
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/* amalgamated result */ 



itmp = tmp = x*fl + y*f2 + z*f3; 
return ( tmp - itnp ) ; 

} 

C INCLUDE FILES 



GLOBAL VARIABLE TYPES - PEP_TYPE,H 



/* Global types used in the program */ 
typedef enum {FALSE, TRUE} logical; 

typedef enum {BAD, G, A, V, L, I, S, T, D, E, Q, K, H, R, F, Y, 
W, M, P) 

acid_label ; 

typedef enum {UNKNOWN, nonCunit, Cunit} unit_label; 
typedef struct { 

double x,y,2; 
) vector; 
typedef struct { 

vector axis; 
int atom_num; 
int bond[MAX_BONDS] ; 
) connector; 
typedef struct bond_struct { 

connector tail; 

struct rigid_unit_struct ♦next; 
} bond_type; 
typedef char * string; 



typedef struct { 

char name [NAME_LENGTH) ; 
char type fNAME_LENGTH] ; 
doxible charge, ri, ei; 
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vector position; 

acid_label residue; 

int residue_num; 
} atom^info; 
typedef struct rigid_unit_struct { 

unit_label type; 

connector head; 

int liBt_num; 

int n_bonds ; 

bond_type ♦♦bond; 

int n_atoms ; 

atom_info ♦atom; 
} rigid^unit ; 
typedef struct { 

atom_info ♦p; 
vector position; 
} atom^list; 
typedef struct { 

char typel [NAME_LENGTH] , type2 [NAME_LENGTH] , 
type3 [NAME_LENGTH] , type4 [NAME_LENGTH] ; 
double vO [3], phiO[3]; 
} tcrsion_data; 
typedef struct torsion_list_struct { 

int num[4] ; 
torsion_data ♦p; 
int degen; 

struct torsion_list_struct ♦next; 
} torsion_list ; 
typedef struct { 

char type [NAME_LENGTH] ; 
double ri, ei; 
} Ij^data; 
typedef struct { 

char typel [NAME^LENGTH] , type2 [NAME_LENGTH] ; 
double a, b; 
} hbond_data; 
typedef struct hbond_list_struct { 

int nuTn[2] ; 
hbond_data ♦p; 
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Struct libond_list_s tract ♦next; 
} hbond_list; 
typedef staruct { 

rigid_unit *unit, *prev; 
} regrowth ; 

GLOBAL VARIABLES - PBP_VAR.H 

/* Global variables used in the program ♦/ 
#if defined (MAIN) 
#define EXT extern 
^♦else 

#define EXT 
#endif 

EXT torsion_data **torsion_data_list ; 
EXT lj_data ♦*! j_data_list ; 
EXT hbond_data ♦*hbond_data_list ; 
«undef EXT 

GLOBAL FUNCTIONS - PEPTIDE. H 

/* Include files needed by peptide code */ 

#include <stdio.h> 

#include <float.h> 

#include <Tiiath.h> 

#include <fcntl-h> 

#include <stdio,h> 

#include <memory.h> 

#include <malloc.h> 

#include <string.h> 

#include <search.h> 

^include <stdlib.h> 

^include <ermo,h> 

#include <string.h> 

#include <time,h> 
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^include <varargs.h> 
/♦ global constemts */ 

#define BETA 1.6886683 /♦ kB T at 298K ♦/ 

#def ine MAX_BONDS 8 

#define PI 3.1415927 

#define EPS l-OE-9 

#define NAME_LENGTH 10 

#define KMAX 100 

#define MAX^ROOTS 100 

#define DPHl .01 

/* global macros */ 

#define INTERVAL ( a, nl, n2) {(a) >= (nl) && (a) < {n2)) 

/* Include files relevant to this program */ 

#include "pep_type . h" 

#include "pep_var.h" 

/* random. c */ 

double ran (double dummy); 

double ran2 (double dummy) ; 

/* peptidel,c */ 

void out_of_memory (void) ; 

void get_sequence {string **seguence, int *nj>eptides) ; 
rigid_unit *read_peptide_data (string sequence, int ♦n^atoms^total , 

int ♦max_atoms_per_unit) ; 
rigid_unit *read_unit (string file, acid_label label, int 
residue_num, 

int ♦n_atonfis_total , int *max_atoms_per_unit) ; 
void couple_unit (rigid_unit *unitl, rigid_unit *unit2) ; 
rigid_unit *modif y_cystine_ends (rigid_unit *unit, int 
n_amino_acids , 

int ♦n_atoms_total) ; 
void get_main_side lrigid_unit ♦unit, regrowth *main, regrowth 
*side, 

int *n_main, int *n_side) ; 
void read_torsicn_data(void) ; 
void read_l j_data (void) ; 
void read_hbond_data (void) / 

void writ e_car_file (int n_amino_acids, int n_atoms_total, atom_list 
♦atom, 

string file) ; 
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String getline {string line, int len, FILE ♦fp) ; 
void strip (string string); 
void decotnma (string string); 
void capitalize (string s) ; 

void amino_acid_code_3 (acid_label label, string code_3) ; 
void amino_acid_code_l (acid_label label, char code_l) ; 
acid_label amino_acid_code (char code_l) ; 
/* peptide2.c */ 

void initialize^connection^table (int **'bond_table, int 
n_atoms_total) ; 

void make__CQnnection_table (int ♦*bond_table, int ♦table_num, 

rigid_unit *unit, rigid_unit ♦start) ; 
void add_connection (int **bond_table, int il, int i2) ; 
void print_connection_table (int ♦^bond^table, int n_atoms_total) ; 
void get_torsions (torsion_list **p, int **bond_table, int 
♦table_nutn, 

atom_list *atom, rigid_unit *unit, rigid_unit 

♦start) ; 

torsion_list *add_torsion (int **bond_table, atoin^list *atom, int 
i . int j , 

int k, int 1) ; 

logical lookup_torsion_data (string typel, string type2, string 
type3. 

string type4, torsion_data **p) ; 
void print_torsions (torsion_list ♦list, atom^list ♦atom); 
doxiblG torsion (vector pi, vector p2, vector p3 , vector p4) ; 
void assign_l j jparameters (rigid_unit *unit, rigid_unit ♦start); 
logical lookup__lj_data (string type, double ♦ri, double ♦ei) ; 
logical lookup_lj_data (string type, double ♦ri, double ♦ei) ; 
void get_hbonds (hbond_list ♦♦list, atom^list ♦atom, int n_atoms) ; 
logical lookup_hbond_data (string cypel, string type2, hbond_data 
**P) ; 

void print_hbonds (hbond_list ♦I, atom_list ♦atom); 

void assign_atom__pointers (int ♦list^num, rigid^unit ♦unit, 

rigid_unit ♦start, 

atom_list ♦atom) ; 

/♦ peptides. c ♦/ 

void old_unit(int ♦list^num, int nO, int nl, int n2, double 
♦logrosen. 
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rigid_unit *unit, rigid_unit * start, torsion_list ♦t, 
hbond_list atom_list *atom, vector *twig[], 

vector pO, 

vector bO) ; 

void do_xinit{int *list_num, int nO, int nl, int n2, double 
♦logrosen, 

rigid_unit ♦unit, rigid_unit *start, torsion_list *t, 
hbond_list *1, atom^list ♦atom, vector ♦twig[] , vector 

pO, 

vector bO, double *e) ; 
void do_backbone_f ( int i, int n_main, int n_atoms_total , 

double ♦logrosen, 
regrowth ♦main, regrowth ♦side, 
torsion_list ♦t, hbond_list ♦!, 
atom_list ♦atom, vector ♦twig[] , 
double ♦e, logical new) ; 
void do_backbone_f_rigid (int i, int n_main, int n_atoms_total, 

double ♦logrosen, 
regrowth ♦main, regrowth *side, 
torsion_list ♦t, hbond_list +1, 
atom_list *atom, atom_info ♦atom_tmp, 
vector ♦twigtl , 
double ^e, logical new) ; 
void do_backbone_b ( int i, int n_main, int n_atoms_total , 

double ♦logrosen, 
regrowth ♦main, regrowth ♦side, 
torsion^list ♦t, hbond_list ♦I, 
atom_list ♦atom, vector ♦twig[], 
double ♦e, logical new) ; 
void do_backbone_b_rigid (int i, int n_main, int n_atoms_total , 

double ♦logrosen, 
regrowth ♦main, regrowth ♦side, 
torsion_list ♦t, hbond_list ♦I, 
atom_list ♦atom, atom_info ♦atom^tmp, 

vector ♦twig [] , 

double ♦€, logical new) 
void do_unit_sub (int ♦list^num, int nO, int nl, int n2, double 
♦logrosen, 

rigid_unit ♦unit, torsion_list ♦t, hbond_list ♦I, 
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atom_list *atom, vector ♦twigt] , vector pi, vector 

hi, 

vector pO, vector bO, double *e. vector 

p [MAX_BOKDS] , 

vector blMAX_BONDS] , logical new) ; 

void add_rigid_unit(rigid_unit *unit, vector *pos, 

vector pi, vector bl, vector pO, 
vector bO, vector point fMAX_BONDS] , 
vector bond [MAX_BONDS] , 
double cos_theta2, double sin_theta2) ; 

vector align(vector p. vector rO, vector rl, vector n,' double 

cos_theta, 

double sin_theta, vector n2, double cos_theta2, double 

sin_theta2) ,- 

/♦ peptide4.c */ 

double delta_energy(torsion_list *t, hbond_list *1, atom_list 
*atom, 

vector *twig, int n_atoms. int nO, int nl, int 

n2, 

int n_twig) ; 

double energy (torsion.list •t, hbondjist *1, atom_list *atom, 
int n_atoms_total) ; 

double d_nonbond_energy(tor6ion_list -t. atom_list *atom. vector 
♦twig, 

int n_atoms, int nO, int nl, int n2, int 

n_twig) ; 

double nonbond_energy(torsion_list *t, atom_list *atom. int 
n_atoms_total) ; 

double d_hbond_energy(hbond_list *1, atom_list *atom, vector *twig, 

int n_atoms, int nO, int nl, int n2, int 

n_twig) ; 

double hbond_energy (hbond_list *1, atOTn_list *atom) 

double d_torsion_energy(torsion_list ♦t, atom_list ♦atom, vector 
*twig, 

int n_atoms, int nO, int nl, int n2, int 

n_twig) ; 

double torsion_energy(torsion_list ♦t, atom_list ♦atom) ,• 
/♦ peptides. c ♦/ 

void do_mc(rigid_unit ♦unit, torsion_list *t, hbond^list ♦!, 
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atorn_list ♦atom, atom^list *atom2, atom^info *atom_ttnp, 

vector *twig[], regrowth *main, regrowth *side, 

int n_amino_acids , int n_atoms_total, int n_main, int 

n_side, 

logical cyclic) ; 

void read_restart (atom^list *atom, int n_atoms_total) ; 

void read_cycle (torsion_list ♦t, hbond_list *1, 

atom_list ♦atom, regrowth ♦main, regrowth ♦side, 
vector *twig[], int n_main, int n_side, int 

n_atoms_total) ; 

void regrow_raain(torsion_li6t ♦t, hbond_list ♦!, 

atom_list ♦atom, atom_list ♦atom2, 
atom_info ♦atom_tmp, vector ♦twigt], 
regrowth *main, regrowth ♦side, 
int n_main, int n_atoms_total , double ♦€) ; 

void regrow_side (torsional is t ♦t, hhond_list ♦I, 

atom_list ♦atom, atom_list ♦atom2, vector ♦twig[] , 

regrowth ♦main, regrowth ♦side, 

int n_side, int n_atoms_total, double ♦e) ; 

/♦ peptides. c ♦/ 

void rotate_main (atom_list ♦atom, atom_list ♦atom2, vector ♦twigL] , 

regrowth ♦main, regrowth ♦side, torsional ist ♦t, 
hbond_list ♦I, int n_main, int n_atoms_total , 

double ♦e) ; 

void get_rot_pararas (atom_list ♦atom, regrowth ♦main, int iO, inc 
n_main) ; 

void get_rot_rosenbluth(atom_list ♦atom, atom_list ♦atom2, 

vector ♦twig[], regrowth ♦main, 
torsion_list ♦t, hbond_list ♦I, int iO, int 

n_main , 

int n_atoms_total , int ♦n, int ♦j, double 

♦logrosen, 

double ♦e) ; 
double jac(vector rt?]); 

vector rotate_rl (atom_list ♦atom, regrowth ♦main, int iO, int 
n_main) ; 

void get_r (doiible phil, double phi2, doiible phi3 , double phi4, 
double phiS) ; 

void do_rotation(atom_list ♦atom, vector ♦twig, regrowth ♦main, int 



220 



wo 96/30849 



PCr/US96/04229 



iO, 

int n_main, int n_atoms_total) ; 
void getjphil (double phi [MAX_ROOTS] [61 , int ♦n) ; 

void get_root (double xO, double xl, double ♦pi, double *p2, double 

double ♦p4, double ♦pB, int n) ; 
void FSinit (vector q2, double ♦phil) ; 

void F5 (double phil, double phi2[4], double phi3 [4] , double 
phi4 [4] , 

double phis [4], double f[4], logical valid [4]); 
/♦ peptide?. c */ 

vector vector_rotate (vector a, vector n, double cos_theta, double 
sin_theta) ; 

vector get_main_bO (atotn^list *atoK\, regrowth *main, int i) ; 
vector get_main_j)0 (atom_list ♦atom, regrowth ♦main, int i) ; 
vector get_side_bO (atom_list ♦atom, regrowth ♦side, int i) ; 
vector get_side_pO (atom_list ♦atom, regrowth ♦side, int i) ; 
void flory_rot_matrix (double theta, double phi, double m[3]l3]); 
vector f lory_rot (doxible theta, double phi, vector a) ; 
vector flory_rotinv{ double theta, double phi, vector a) ; 
void flory_lab (double ml3] [3] , vector r, vector De- 
void flory_labinv( double m[3J [3], vector r, vector 1) ; 
vector vector_cross (vector a, vector b) ; 
vector vector_scale (vector v, dox±>le r) 
void mxm(double m[3] [3] , double n[3] [3] ) ; 
double dets (double mt5] [5] ) ; 
double det (double m[3l [3] ) ; 
vector mxb (double Tn[3] [3] , vector b) ; 
vector bxm (double m[3l [3] , vector b) ; 
double vector_dot (vector bl, vector b2) ; 
doxible vector_length (vector v) ; 
double vector_length2 (vector v) ; 

DATA FILES DEFINING GEOMETRIC STRUCTURE 
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DATA FILE FOR UNIT A - UNITA.DAT 



] data file for rigid unit A- -the NH2 terminus 
1 ! rigid unit in this Btructure 
! ATOM INFORMATION 
I rigid unit 0 

3 .'atoms in this rigid unit 

N 0-039019567 -0.028048204 0.000005808 ALAn 1 NT 

N -0.4S3 

HNl -0.294595420 0.946419656 0.000007165 ALAn 1 H 

H 0.126 

HN2 -0.309849501 -0.509882152 -0.840834498 ALAn 1 H 

H 0.126 
! BOND INFORMATION 
! rigid unit 0 

0 12-1-1 lending of incoming bond- -doesn' t mean anything, but 
must not be 1 

0 0 . 00000001 !beginning of incoming bond -- just an overall 
displacement 

1 !bond out from this unit 

-1 Idon't know which unit this bond goes to 

0 12-1-1 ! beginning of outgoing backbone bond 

1.498959541 -0.043336947 -0.000000042 lending of outoing bond 



DATA FILE FOR UNIT B - UNITB.DAT 



I data file for rigid unit B--the CH alpha carbon unit 
1 'rigid unit in this structure 
! ATOM INFORMATION 

1 rigid unit 0 

2 1 atoms in this rigid unit 

CA 4.047343731 2.755753756 -0.000011837 ALA 2 CT 

C 0.035 

HA 3.779272556 3.294512749 -0.928205431 ALA 2 HC 

H 0.032 
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! BOND INFORMATION 
! rigid unit 0 

0 1-1-1 -1 lending of incoming backbone bond 

3,370934725 1,461895347 -0.000009674 Ibeginning of incoming 
backbone bond 

2 ! bonds out from this unit 

-1 I don't know which unit this bond goes to 

0 1 -1 -1 -1 Ibeginning of outgoing side- chain bond 

3.538550615 3.547572B51 1.217100978 lending of outgoin 

side -chain bond 

-1 Idon't know which unit this bond goes to 

0 1-1-1 -1 Ibeginning of outgoing backbone bond 

5.547336102 2.582198620 -0,000015057 lending of outgoing 

backbone bond 

DATA FILE FOR UNIT C - UNITCDAT 



! data file for rigid unit C--the OCNH amide bond unit 

1 ! rigid unit in this structure 

! ATOM INFORMATION 

! rigid unit 0 

4 J atoms in this rigid unit 

C 2.054825068 1.360626340 0.000001071 ALAn 1 C 

C 0.616 

0 1.320880890 2.356072187 0.011419594 ALAn 1 0 

O -0.504 

N 3.370934725 1.461895347 -0.000009674 ALA 2 N 

N -0.463 

HN 3.917454243 0.530382395 -0.000003380 ALA 2 H 

H 0.252 
! BOND INFORMATION 
! rigid unit 0 

0 12-1-1 lending of incoming main- chain bond 

1.498959541 -0.043336947 -0.000000042 Ibeginning of incoming 
main- chain bond 

1 Ibond out from this tinit 

-1 Idon't know which unit this bond goes to 
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2 0 3-1-1 'beginning of outgoing main- chain bond 

4,047343731 2.755753756 -0.000011837 lending of outging 

main -chain bond 

DATA FILE FOR UNIT D - UNITD.DAT 



• data file for rigid unit D--the HCO terminus 

1 ! rigid unit in this structure 

! ATOM INFORMATION 

! rigid unit 0 

3 ! atoms in this rigid unit 

C 8.274295807 5.082911491 -0.000008575 ALAN 3 C 

C 0.616 

HC 9.361082077 5.166533947 -0.000010758 ALAN 3 HC 

H 0.000 

0 7.540351391 6.078356743 0.011415332 ALAN 3 O 

0 -0.504 
! BOND INFORMATION 
! rigid unit 0 

0 12-1-1 lending of incoming main- chain bond 

7.718430996 3.678948641 -0.000013665 'beginning of incoming 

main- chain bond 

0 ! bonds out from this unit 



DATA FILE FOR ALANINE - A. DAT 



! The side-chain structure file for Alanine 
1 ] rigid unit in side-chain 
! ATOM INFORMATION 
! rigid unit 0 

4 ! atoms in this rigid unit 

CB 3.178086281 3.790203094 1.217109203 ALA 2 

CT C -0.098 

HBl 3.502361059 4.845792770 1.274110079 ALA 2 

HC H 0.038 
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HB2 2.072028160 3.800241470 1.180677295 ALA 2 

HC H 0.038 

HB3 3.465983868 3.309211969 2.172164917 ALA 2 

HC H 0.038 

! BOND INFORMATION 
! rigid unit 0 

0 12 3-1 lending of incoming bond for unit 0 and nn 

3.783586502 3.069634676 -0 . 000003090 ! beginning of bond f 
unit 0 

0 ! bonds out from rigid unit 0 

*♦•♦♦«■♦**•*♦♦♦♦*♦♦♦♦»♦•*•••*♦♦♦*♦♦**♦♦♦♦♦*♦♦****«*♦♦♦«■*♦***♦** 

DATA FILE FOR CYSTEINE - C.DAT 

• nil********************************************************** 



1 The side-chain structure file for Cysteine 
! Do not modify the atom order in this file 

2 ! rigid units in side- chain 
! ATOM INFORMATION 

! rigid unit 0 

3 ! atoms in this rigid unit 



CB 3.185384274 3.813543320 1.210355163 CYSH 2 

CT C -0.060 

HBl 2.082855701 3.742515087 1,217666388 CYSH 2 

HC H 0.038 

HB2 3.528102398 3.371057510 2.168041706 CYSH 2 

HC H 0.038 

! rigid unit 1 

4 ! atoms in this rigid unit 

S6 3.628824234 5.564641953 1.166115854 CYSH 2 

SH SO. 827 

LGl 2.774378061 6.223292828 1.382826447 CYSH 2 

LP L -0.481 

1/32 4 . 0 1844 8 3 5 3 5 . 8 7 94 4 793 7 0.18 8 7 843 61 CYSH 2 

LP L -0.481 

HG 4.543437004 5.521058083 2.133599997 CYSH 2 

HS HO. 135 



! BOND INFORMATION 
! rigid unit 0 
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0 12-1-1 lending of incoming bond for unit 0 and nn 
3.783586502 3.069634914 -0.000003354 'beginning of bond for 
unit 0 

1 'bonds out from rigid unit 0 
1 !unit 0 is bonded to unit 1 

0 12-1-1 ! beginning of outgoing bond and nn 

3.628824234 5.564641953 1.168115854 'ending of outgoing 
bond for unit 0 
! rigid unit 1 

0 12 3-1 lending of incoming bond for unit 1 and nn 
3.1B5384274 3.813543320 1.210355163 Ibeginning of bond for 
unit 1 

0 'bonds out from rigid unit 1 

DATA FILE FOR ASPARTATE - D.DAT 



! The side- chain structure file for Aspartate 
2 ! rigid units in side- chain 
! ATOM INFORMATION 



! rigid unit 0 

3 ! atoms in this rigid unit 

CB 3.195193052 3.859569550 1.198083878 ASP 2 

CT C -0.398 

HBl 2.099623203 3.734851122 1.256908774 ASP 2 

HC H 0.071 

HB2 3.574837923 3.424842119 2.144523859 ASP 2 

HC H 0.071 

! rigid unit 1 

3 .'atoms in this rigid unit 

CG 3.488366127 5.366341114 1.240691185 ASP 2 

C C 0.714 

GDI 3.752036572 5.965095997 2.273211718 ASP 2 

02 0 -0.721 

0D2 3.445515871 5.949848175 0.005213364 ASP 2 

02 O -0.721 



! BOND INFORMATION 
! rigid unit 0 
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0 12-1-1! nding of incoming bond for xinit 0 ana nn 
3.7835B6502 3.069634438 -0.000003352 .'beginning of bond for 
unit 0 

1 ! bonds out from rigid unit 0 
1 !unit 0 is bonded to unit 1 

0 12-1-1 ! beginning of outgoing bond and nn 

3.488366127 5.366341114 1.240691185 lending of outgoing 
bond for unit 0 
! rigid unit 1 

0 12-1-1 'ending of incoming bond for unit 1 and nn 
3.195193052 3.859569550 1.198083878 Ibeginning of bond for 
unit 1 

0 1 bonds out from rigid unit 1 

DATA FILE FOR GLUTAMINE - E.DAT 



I The side-chain structure file for Glutamine 

3 ! rigid units in side- chain 

! ATOM INFORMATION 

! rigid unit 0 

3 ! atoms in this rigid unit 



CB 3.210191727 3.806770086 1.242457986 GLU 2 

CT C -0.184 

HBl 3.453276873 4.884052753 1.160096049 GLU 2 

HC H 0.092 

HB2 2.103818655 3.775332928 1.193925381 GLU 2 

HC H 0.092 

! rigid unit 1 

3 ! atoms in this rigid unit 

CG 3.670672178 3.303917646 2.650651217 GLU 2 

CT C -0.398 

HGl 3.495624304 2,214699984 2.732162237 GLU 2 

HC H 0.071 

HG2 4.766S3B143 3.410970449 2.754028797 GLU 2 

HC H 0.071 

! rigid unit 2 



3 ! atoms in this rigid unit 
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CD 3.044564962 3.944746017 3.891577959 GLU 2 

C C 0.714 

OEl 3.318646908 3.594962835 5.031950951 GLU 2 

02 O -0.721 

0E2 2.157183647 4.937835217 3.607111931 GLU 2 

02 0 -0.721 



! BOND INFORMATION 
! rigid unit 0 

0 12-1-1 lending of incoming bond for unit 0 and nn 
3.783586502 3.069634438 -0.000003351 ibeginning of bond 

for unit 0 

1 ! bonds out from rigid unit 0 
1 !unit 0 is bonded to unit 1 

0 12-1-1 ! beginning of outgoing bond and nn 
3.670672178 3.303917646 2.650651217 lending of outgoing 
bond for unit 0 
! rigid unit 1 

0 12-1-1 lending of incoming bond for unit 1 and nn 
3.210191727 3.806T70086 1.242457986 Ibeginning of bond for 
unit 1 

1 ! bonds out from rigid unit 1 

2 lunit 1 is bonded to unit 2 

0 12-1-1 ! beginning of outgoing bond and nn 

3.044564962 3.944746017 3.891577959 lending of outgoing 
bond for unit 1 
! rigid unit 2 

0 12-1-1 lending of incoming bond for unit 1 and nn 
3.670672178 3.303917646 2.650651217 Ibeginning of bond for 
unit 2 

0 I bonds out from rigid unit 2 

DATA FILE FOR PHENYLALANINE - F.DAT 

**it1tit*it*'kititit1r1citit1tit*'klr1rif'^itlt*itltit**itir**1r'ticitir*irit*it1ritii*ir* 

1 The side- chain structure file for Phenylalanine 

2 ! rigid units in side-chain 
I ATOM INFORMATION 

I rigid unit 0 
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3 ! atoms in this rigid unit 

CB 3.271046400 3.829343796 

CT C -0.100 

HBl 3.711064339 3.375446320 

HC H 0.108 

HB2 3.680548668 4.858696938 

HC H 0.108 

! rigid unit 1 

11 ! atoms in this rigid unit 



1.261018753 PHE 2 
2.172759056 PHE 2 
1.261503935 PHE 2 



CG 




1 


746863961 


3 


913921356 


1 


435616050 
> ^ -J -J \j ^ \j \j ^ \j 


PHE 


2 


CA 


r 




-0 . 100 














CDl 




1 


070973635 


2 


894981861 


2 


116770267 


PHE 


2 
*- 


CA 


r 




- 0 , 150 














HDl 






621361971 


2 


061387062 


2 


533305407 


PHE 


2 


HC 


H 




0 • 150 














Vi 1 / ^ 




X 


019180536 


4 


963639259 


0 


869901121 


PHF 


2 


CA 


r 




-0 150 


















1 


52804S277 


s 


750367641 


0 


3 3138144 0 


PHP 




HC 


H 




0 . 150 














CEl 




0 


.315989435 


2 


.915796280 


2 


.214086056 


PHE 


2 


CA 


C 




-0.150 














HEl 




0 


.830357015 


2 


.108316422 


2 


.715482712 


PHE 


2 


HC 


H 




0.150 














CE2 




0 


.369023502 


4 


.989082813 


0 


.977358818 


PHE 


2 


CA 


C 




-0.150 














HE2 




0 


.928361893 


5 


.798536777 


0 


.531342983 


PHE 


2 


HC 


H 




0.150 














CZ 




1 


.036266327 


3 


.964326382 


1 


.646436572 


PHE 


2 


CA 


C 




-0.150 














HZ 




2 


.113304853 


3 


.975853443 


1 


.718335271 


PHE 


2 


HC 


H 




0.150 















! BOND INFORMATION 
! rigid unit 0 

0 12-1-1 lending of incoming bond and nn 

3.783586264 3.069634914 -0.000003353 Ibeginning of bond 

1 ! bonds out 

1 lunit bonded to 

0 12-1-1 ! beginning of outgoing bond and nn 

1.746863961 3*913921356 1.435816050 lending of outgoing 
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bond 

! rigid unit 1 

0 1 3 -1 -1 lending of incoming bond and nn 

3.271046400 3.B29343796 1,261018753 Ibeginning of bond 

0 ! bonds out 

DATA FILE FOR GLYCINE - G.DAT 

1 The side- chain Btructure file for Glycine 
1 'rigid unit in side- chain 

1 ATOM INFORMATION 

! rigid unit 0 

1 'atom in this rigid unit 

HA2 2.054570675 -0.518772364 -0.887896836 GLYN 1 

HC H 0.032 

1 BOND INFORMATION 
! rigid unit 0 

0 -1 -1 -1 -1 lending of incoming bond for unit 0 and nn 
1.612465143 -0.031237146 -0.000000015 ! beginning of incoming 
bond for unit 0 

0 ! bonds out from rigid unit 0 



DATA FILE FOR HISTIDINE - H.DAT 



1 The side -chain structure file for Histidine 

2 ! rigid units in side- chain 
I ATOM INFORMATION 

! rigid unit 0 

3 ! atoms in this rigid unit 



CB 3.239844084 

CT C -0.098 
HBl 2.644425392 

HC H 0,038 
HB2 4.064783096 

HC H 0.038 



3.731920242 



3.025787830 



4.071127415 



1.277127385 HIS 
1.893024564 HIS 
1.934927344 HIS 
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! rigid unit 1 

8 I atoms in this rigid unit 



CG 




2 


.370461226 


4 


.918142319 


0. 


978080690 


HIS 


2 


CC 


C 




0.251 














NDl 




2 


.062596560 


5 


.403582573 


-0. 


290515751 


HIS 


2 


NB 


N 




-0.502 














CEl 




1 


.272076607 


6 


.440367222 


0. 


045922592 


HIS 


2 


CR 


C 




0.241 














NE2 




1 


.048720956 


6 


.674089432 


1. 


367565274 


HIS 


2 


NA 


N 




-0.146 














CD2 




1 


.767608762 


5 


.675839901 


1 . 


972463250 


HIS 


2 


CW 


C 




-0.184 














HEl 




0 


.858503580 


7 


.036557198 


-0. 


757577479 


HIS 


2 


HC 


H 




0.036 














HE2 




0 


.480951071 


7 


.411210537 


1 . 


809884906 


HIS 


2 


H 


H 




0.228 














HD2 




1 


.867301583 


5 


.485908508 


3 . 


037219763 


HIS 


2 


HC 


H 




0.114 















\ BOND INFORMATION 
I rigid unit 0 

0 12-1-1 lending of incoming bond for unit 0 and nn 
3.783586502 3.069634438 -0.000003353 'beginning of bond for 
unit 0 

1 ! bonds out from rigid unit 0 
1 lunit 0 is bonded to unit 1 

0 12-1-1 I beginning of outgoing bond and nn 
2.370461226 4.918142319 0,978080690 lending of outgoing 
bond for unit 0 
! rigid unit l 

0 14-1-1 lending of incoming bond for unit l and nn 
3.222899199 3.830397844 1.236912012 'beginning of bond for 
unit 1 

0 ! bonds out from rigid unit 1 

DATA FILE FOR ISOLEUCINE - I. DAT 
! The side- chain structure file for Isoleucine 
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4 ! rigid units in side-chain 
! ATOM INFORMATION 
! rigid unit 0 

2 'atoms in this rigid unit 

CB 3.184130907 3.905461311 

CT C -0.012 

HB 3.579479933 3.448693275 

HC H 0.022 

! rigid unit 1 
4 ! atoms in this rigid unit 
CG2 3.632628202 5.399640560 

CT C -0.085 

HG21 3.256929159 
HC H 0.029 

HG22 4.728721142 
HC H 0.029 

HG23 3.277012348 
HC H 0.029 

! rigid unit 2 

3 ! atoms in this rigid unit 

CGI 1.625806093 3.868085861 

CT C -0.049 

HGll 1.169472456 
HC H 0.027 

HG12 1.273633957 
HC H 0.027 

! rigid unit 3 

4 ■ atoms in this rigid unit 



5 .962747097 



5.525658131 



5 .929985046 



4.395492077 



2 .823534966 



4.391342163 



4.262083530 



3.852109432 



5.466014240 



1.203313947 ILE 2 



2.135145664 ILE 2 



1.184555411 ILE 2 



2.057613134 ILE 2 



1.229067683 ILE 2 



0.281316549 ILE 2 



1.310235620 ILE 2 



0.450418025 ILE 2 



1.211708426 ILE 2 



CDl 1.028863907 
CT C -0.085 

HDll -0.068560459 
HC H 0.028 

HD12 1.436750174 
HC H 0.028 

HD13 1.222232699 
HC H 0.028 

! BOND INFORMATION 
! rigid \init 0 

0 1 -1 -1 -1 lending of incoming bond and nn 



2.632859945 ILE 2 



2.654643297 ILE 2 



3.508637428 ILE 2 



2.787941933 ILE 2 
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3.783586502 3.069634438 -0.000003350 Ibeginning of bond 
2 I bonds out 

1 !unit bond d to 

01-1-1-1 ! beginning of outgoing bond and nn 

3.632628202 5.399640560 1.184555411 lending of outgoing 

bond 

2 lunit bonded to 

01-1-1-1 1 beginning of outgoing bond and nn 

1.625806093 3.868085861 1.310235620 lending of outgoing 
bond 

! rigid unit 1 

0 12 3-1 lending of incoming bond and nn 

3.184130907 3.905461311 1.203313947 ! beginning of incoming 
bond 

0 1 bonds out 

! rigid unit 2 

0 12-1-1 lending of incoming bond and nn 

3.184130907 3.905461311 1.203313947 ! beginning Of incoming 
bond 

1 ! bonds out 

3 lunit bonded to 

0 12-1-1 ! beginning of outgoing bond and nn 

1.028863907 4.391342163 2.632859945 lending of outgoing 

bond 

! rigid unit 3 

0 12 3-1 lending of incoming bond and nn 

1.625806093 3.868085861 1.310235620 Ibeginning of bond 

0 1 bonds out 

DATA FILE FOR LYSINE - K.DAT 

1 The side-chain structure file for Lysine 
5 ! rigid units in side- chain 

! ATOM INFORMATION 

1 rigid unit 0 

3 ! atoms in this rigid unit 

CB 3.218223095 3.829745770 1.231236458 LYS 2 
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CT C -0.098 

HBl 2.112416506 3.764609814 

HC H 0.038 

HB2 3.536234617 3.317805290 

HC H 0.038 

! rigid unit 1 

3 ! atoms in this rigid unit 

CG 3.638167858 5.320005417 

CT C -0.160 

HGl 4.741127968 5.406830788 

HC H 0.116 

HG2 3.295989990 5.833013058 

HC H 0.116 

! rigid unit 2 

3 ! atoms in this rigid unit 

CD 3.153400660 6.084614754 

CT C -0.180 

HDl 2.046517849 6.074027538 

HC H 0.122 

HD2 3.501233101 5.571547031 

HC H 0.122 

! rigid unit 3 

3 ! atoms in this rigid unit 

CE 3.699187756 7.518018246 

CT C -0.038 

HEl 4.805956841 7.515174866 

HC H 0.098 

HE2 3.475801945 8.000639915 

HC H 0.098 

! rigid unit 4 

4 ! atoms in this rigid unit 

NZ 3.098134756 8.306216240 

N3 N -0.138 

H21 3.463554621 9.268757820 

H3 H 0.294 

HZ2 2.074491024 8.324481964 

H3 H 0.294 

HZ3 3.335658073 7.877095222 

H3 H 0.294 



PCT/US96/04229 

1.234413505 LYS 2 

2.163102627 LYS 2 

1.281187057 LYS 2 

1.274424553 LYS 2 

0.360635072 LYS 2 

2.516160011 LYS 2 

2.552636147 LYS 2 

3.435809374 LYS 2 

2.469964743 LYS 2 

2.558616400 LYS 2 

1.495867610 LYS 2 

3.560437918 LYS 2 

3.530759573 LYS 2 

3.447653770 LYS 2 

4.466163158 LYS 2 
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! BOND INFORMATION 
! rigid unit 0 

0 12-1-1 lending of incoming bond and nn 

3.783586502 3.069634914 -0.000003353 !beginning of bond 

1 ! bonds out 

1 !unit bonded to 

0 12-1-1 1 beginning of outgoing bond and nn 

3,638167858 5.320005417 1.281187057 lending of outgoing 

bond 

! rigid unit 1 

0 12-1-1 lending of incoming bond and nn 

3.21822309S 3.829745770 1 . 231236458 ! beginning of bond 

1 ! bonds out 

2 lunit bonded to 

0 12-1-1 ! beginning of outgoing bond and nn 

3,153400660 6.084614754 2.516160011 lending of outgoing 

bond 

! rigid unit 2 

0 12-1-1 lending of incoming bond and nn 

3,638167858 5,320005417 1,281187057 'beginning of bond 

1 J bonds out 

3 iunit bonded to 

0 12-1-1 ! beginning of outgoing bond and nn 

3.699187756 7.518018246 2.469964743 lending of outgoing 

bond 

! rigid unit 3 

0 12-1-1 lending of incoming bond and nn 

3.153400660 6.084614754 2 . 516160011 ! beginning of bond 

1 ! bonds out 

4 !unit bonded to 

0 12-1-1 ! beginning of outgoing bond and nn 

3.098134756 8.306216240 3.560437918 lending of outgoing 
bond 

! rigid unit 4 

0 12 3-1 lending of incoming bond and nn 

3.699187756 7.518018246 2 . 469964743 !beginning of bond 
0 ! bonds out 
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DATA FILE FOR LEUCINE - L.DAT 



1 The side-chain structure file for Leucine 
4 J rigid units in side- chain 

! ATOM INFORMATION 
! rigid unit 0 

3 ! atoms in this rigid unit 

CB 3 .217977524 3 . B60693455 

CT C -0.061 

HBl 3.617908239 3.413237095 

HC H 0.033 

HB2 3 .641148329 4.884153843 

HC H 0.033 

! rigid unit 1 

2 1 atoms in this rigid unit 
CG 1.676206470 3.974944353 
CT C -0.010 

HG 1.273 801684 2 .962582827 

HC H 0.031 

! rigid unit 2 

4 ! atoms in this rigid unit 
CDl 1.322771311 4.880306721 



4 .936426640 
4.507015228 
5.916738033 



CT C -0.107 

HDll 0.229164675 

HC H 0.034 

HD12 1.758654118 

HC H 0.034 

HD13 1.684926391 

HC H 0.034 
! rigid unit 3 

4 ! atoms in this rigid unit 

CD2 0,998154640 4,504262924 

CT C -0.107 

HD21 -0.093163513 4.622812748 

HC H 0.034 

HD22 1.406615853 5.481475830 

HC H 0.034 

HD23 1.130140185 3 .802904606 



1.213688374 LEU 2 
2.146348953 LEU 2 
1.193638206 LEU 2 



1.357627273 LEU 2 
1.570222020 LEU 2 



2.545703411 LEU 2 

2.704123735 LEU 2 

3.491832256 LEU 2 

2.406197309 LEU 2 



0.083184890 LEU 2 

0.214309067 LEU 2 

-0.234147355 LEU 2 

-0.761629283 LEU 2 
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HC H 0.034 

! BOND INFORMATION 
! rigid unit 0 

0 12-1-1 lending of incoming bond and nn 

3.783586502 3,069634438 -0 . 000003367 !beginning of bond 

1 ! bonds out 

1 lunit bonded to 

0 12-1-1 ! beginning of outgoing bond and nn 

1.676206470 3.974944353 1.357627273 i ending of outgoing 

bond 

! rigid unit 1 

0 1 -1 -1 -1 lending of incoming bond and nn 

3.184130907 3.905461311 1.203313947 ibeginning of incoming 
bond 

2 ! bonds out 

2 lunit bonded to 

01-1-1-1 ! beginning of outgoing bond and nn 

1.322771311 4.880306721 2.545703411 lending of outgoing 

bond 

3 lunit bonded to 

01-1-1-1 1 beginning of outgoing bond and nn 

0.998154640 4.504262924 0.083184890 lending of outgoing 

bond 

1 rigid unit 2 

0 12 3-1 lending of incoming bond and nn 

1.676206470 3.974944353 1.357627273 Ibeginning of incoming 
bond 

0 ! bonds out 

1 rigid unit 3 

0 12 3-1 lending of incoming bond and nn 

1,676206470 3.974944353 1.357627273 Ibeginning of bond 

0 ! bonds out 

DATA FILE FOR METHIONINE - M.DAT 

1 The side- chain structure file for Methionine 

4 'rigid units in side-chain 
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1.225060582 MET 2 
2.163037539 MET 2 
1.262409329 MET 2 



1.265707970 MET 2 
0.302083224 MET 2 
1.452733874 MET 2 



WO 96/30849 PCTAJS96/04229 

! ATOM INFORMATION 
! rigid unit 0 

3 ! atoms in this rigid unit 
CB 3.219568014 3.840672970 

CT C -0.151 

HBl 3.547865868 3.348565578 

HC H 0.027 

HB2 3.671003819 4.850576401 

HC H 0.027 

! rigid unit 1 
3 ! atoms in this rigid unit 
CG 1.685955524 4.011272907 

CT C -0.054 

HGl 1.291312337 4.382569790 

HC H 0.0652 

HG2 1.199923158 3.034499168 

HC H 0.0652 

! rigid unit 2 

3 ! atoms in this rigid unit 
SD 1.234688163 5.162067413 
S S 0.737 

LDl 1.486726403 6.202064514 

LP L -0.381 

LD2 1.747960329 4.937880516 

LP L -0.381 

! rigid unit 3 

4 ! atoms in this rigid unit 
CE -0.532971203 4.837210655 
CT C -0.134 
HEl -0.987815082 
HC H 0.0652 
HE2 -1.033426285 
HC H 0.0652 
HE3 -0.725545764 
HC H 0.0652 

! BOND INFORMATION 
! rigid unit 0 

0 12-1-1! nding o£ incoming bond and nn 

3.783586502 3.069634438 -0.000003354 'beginning of bond 



2.574714422 MET 2 
2.319993973 MET 2 
3 .521441460 MET 2 



4.991072178 
5.510134220 
3 .794905424 



2.617241383 MET 2 

1.622043610 MET 2 

3.335405111 MET 2 

2.929581165 MET 2 
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1 ! bonds out 

1 !unit bonded to 

0 12-1-1 I beginning of outgoing bond and nn 

1.685955524 4.011272907 1.265707970 lending of outgoing 

bond 

! rigid unit 1 

0 12-1-1 lending of incoming bond and nn 

3.219568014 3.840672970 1.225060582 Ibeginning of bond 

1 I bonds out 

2 !unit bonded to 

0 12-1-1 ! beginning of outgoing bond and nn 

1.234688163 5.162067413 2.574714422 lending of outgoing 

bond 

! rigid unit 2 

0 12 -1-1 lending of incoming bond and nn 

1.685955524 4.011272907 1.265707970 'beginning of bond 

1 ! bonds out 

3 I unit bonded to 

0 12-1-1 I beginning of outgoing bond and nn 
-0.532971203 4.837210655 2.617241383 lending of outgoing 

bond 

1 rigid unit 3 

0 12 3-1 lending of incoming bond and nn 

1.234688163 5.162067413 2 . 574714422 ! beginning of bond 
0 ! bonds out 

DATA FILE FOR APSARAGINE - N.DAT 

! The side- chain structure file for Asparagine 

2 ! rigid units in side-chain 
! ATOM INFORMATION 

! rigid unit 0 

3 'atoms in this rigid unit 

CB 3.222899199 3.830397844 1,236912012 ASN 2 

CT C -0.086 

HBl 3.611397266 3.364436865 2.163546562 ASN 2 

HC H 0.038 
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HB2 3.616078854 4.863478184 1.264652491 ASN 2 

HC H 0.038 
! rigid unit 1 

5 ! atoms in this rigid unit 

CG 1.698638678 3.892561436 1.381467938 ASN 2 

C C 0.675 

ODl 1.085211635 3.155725241 2.139311790 ASN 2 

0 O -0.470 

ND2 1.031797171 4.746669292 0.652490914 ASN 2 

N N -0.867 

HD21 0.019928589 4.602556705 0.711063743 ASN 2 

H H 0.344 

HD22 1.562326550 5.282481670 -0.034363598 ASN 2 

H H 0.344 



! BOND INFORMATION 
! rigid unit 0 

0 12-1-1 lending of incoming bond for unit 0 and nn 
3.783586502 3.069634438 -0.000003353 'beginning of bond for 
unit 0 

1 ! bonds out from rigid unit 0 
1 lunit 0 is bonded to unit 1 

0 12-1-1 ! beginning of outgoing bond and nn 
1.698638678 3.892561436 1.381467938 lending of outgoing 
bond for unit 0 
! rigid unit 1 

0 12-1-1 lending of incoming bond for unit 1 and nn 
3.222899199 3.830397844 1.236912012 Ibeginning of bond for 
unit 1 

0 ! bonds out from rigid unit l 

**itltiiit^ititititit**^1rit*±ititir1ititir^itir't**^it*irirititit'k*'k'k*ititiiitir1t'k^1tirit 

DATA FILE FOR GLUTAMINE - Q.DAT 

1 The side- chain structure file for Glutamine 
3 ! rigid units in side-chain 

! ATOM INFORMATION 
! rigid unit 0 

3 1 atoms in this rigid unit 
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CB 3.221223593 3.805351734 1.236027122 GLN 2 

CT C -0.098 

HBl 2.115758896 3.733683825 1,223282218 GLN 2 

HC H 0,038 

HB2 3.538368225 3.258102417 2.148239136 GLN 2 

HC H 0.038 

! rigid unit 1 

3 ] atoms in this rigid unit 

CG 3.619170427 5.3112301B3 1.384292126 GLN 2 

CT C -0,102 

HGl 4.719832420 5.417502403 1,395145655 GLN 2 

HC H 0.057 

HG2 3.298108339 5,879051685 0.491232127 GLN 2 

HC H 0.057 

! rigid unit 2 

5 I atoms in this rigid unit 

CD 3.148421526 6.090956688 2.618209839 GLN 2 

C C 0.675 

OEl 3.471138716 7.255728722 2,789397001 GLN 2 

0 O -0.470 

NE2 2.408394814 5.500250816 3 . 521779537 GLN 2 

N N -0.867 

HE21 2.231919527 4,508390427 3.353902817 GLN 2 

H H 0.344 

HE22 2.192787886 6.069860935 4.342392445 GLN 2 

H H 0.344 

! BOND INFORMATION 
! rigid unit 0 

0 12 -1 -1 tending of incoming bond for unit 0 and nn 
3.783586502 3.069634438 -0.000003353 Ibeginning of bond for 
unit 0 

1 ! bonds out from rigid unit 0 
1 limit 0 is bonded to unit 1 

0 12-1-1 ! beginning of outgoing bond and nn 
3.619170427 5-311230183 1.384292126 lending of outgoing 
bond for unit 0 
! rigid unit 1 

0 12-1-1 lending of incoming bond for unit 1 and nn 
3.221223593 3-805351734 1,236027122 Ibeginning of bond for 
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unit 1 

1 .'bonds out from rigid unit 0 

2 !unit 1 is bonded to unit 2 

0 12-1-1 ! beginning of outgoing bond and nn 
3.148421526 6.090956688 2.618209839 lending of outgoing 
bond for unit 2 
! rigid unit 2 

0 12-1-1 lending of incoming bond for unit 2 and nn 

3.619170427 5.311230183 1.384292126 Ibeginning of bond for 
unit 2 

0 ! bonds out from rigid unit 2 

DATA FILE FOR ARGININE - R.DAT 



! The side-chain structure file for Arginine 
4 ! rigid units in side- chain 
! ATOM INFORMATION 
! rigid unit 0 
3 ! atoms in this rigid unit 
CB 3 .207483053 3 . 81924 8199 

CT C -0.080 

HBl 2 . 121760130 3 . 616136551 

HC H 0.056 

HB2 3.644849300 3.393733978 

HC H 0.056 

! rigid unit 1 
3 I atoms in this rigid unit 
CG 3.412360668 5,357305527 

CT C -0.103 

HGl 4 •487451553 5 , 614737511 

HC H 0,074 

HG2 2 . 938670874 5 . 796108723 

HC H 0.074 

! rigid unit 2 
3 ! atoms in this rigid unit 
CD 2.850392818 6.038671017 

CT C -0.228 



1.232642174 ARG 2 
1.319550753 ARG 2 
2.159598827 ARG 2 



1.216631651 ARG 2 
1.132990837 ARG 2 
0.315252036 ARG 2 



2.471077681 ARG 2 
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HDl 1.769480824 5.816972256 2.580044270 ARG 2 

HC H 0.133 

HD2 3.3539B9840 5.649005413 3.379585028 ARG 2 

HC H 0.133 
! rigid unit 3 

9 .'atoms in this rigid unit 

NE 3.069616079 7.502031326 2.345978022 ARG 2 

N2 N -0.324 

HE 3.539865971 7,837357998 1.493146777 ARG 2 

H3 H 0.269 

CZ 2.710799694 8.413488388 3.240067959 ARG 2 

CA C 0.760 

NHl 2.972572088 9.643490791 2.971310854 ARG 2 

N2 N -0.624 

HHll 3.439955235 9.745957375 2.068439484 ARG 2 

H3 H 0.361 

HH12 2.697422743 10.348603249 3.651821136 ARG 2 

H3 H 0.361 

NH2 2.114365101 8.144207001 4.363539696 ARG 2 

N2 N -0.624 

HH21 1.888047814 8.930854797 4.969158173 ARG 2 

H3 H 0.3 61 

HH22 1.947107434 7.146794796 4.499028206 ARG 2 

H3 H 0.361 



! BOND INFORMATION 
! rigid unit 0 

0 12-1-1 tending of incoming bond for unit 0 and nn 
3.783586502 3.069634914 -0.000003315 'beginning of bond for 
unit 0 

1 !bond out from rigid unit 0 
1 !unit 0 is bonded to unit l 

0 12-1-1 ! beginning of outgoing bond and nn 

3.412360668 5.357305527 1.216631651 lending of outgoing 

bond for unit 0 

! rigid unit 1 

0 12-1-1 lending of incoming bond for unit 0 and nn 
3.2074B3053 3.819248199 1.232642174 Ibeginning of bond for 

unit 1 

1 Ibond out from rigid unit l 
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2 'unit 1 is bond d to unit 2 

0 12-1-1! beginning of outgoing bond and nn 

2.850392818 6.038671017 2.471077681 'ending of outgoing 

bond 

! rigid unit 2 

0 12-1-1 ! ending of incoming bond for unit 0 and nn 
3.412360668 5.357305527 1.216631651 'beginning of bond for 
unit 2 

1 !bond out from rigid unit 2 

3 lunit 2 is bonded to unit 3 

0 12-1-1! beginning of outgoing bond and nn 

3.069616079 7.502031326 2.345978022 lending of outgoing 

bond 

! rigid unit 3 

0 12-1-1 lending of incoming bond for unit 0 and nn 
2.850392818 6.038671017 2 . 471077681 ! beginning of bond for 
unit 3 

0 ! bonds out from rigid unit 3 

DATA FILE FOR SERINE - S.DAT 



! The side-chain structure file for Serine 

2 ! rigid units in side- chain 
I ATOM INFORMATION 

1 rigid unit 0 

3 ! atoms in this rigid unit 

CB 3 .203660250 3.871555328 

CT C 0.018 

HBl 3.445731640 4.945727825 

HC H 0.119 

HB2 2 .097403765 3 .828571320 

HC H 0.119 

! rigid unit 1 

2 ! atoms in this rigid unit 
OG 3 .711599350 3.433972597 
OH 0 -0.550 

HG 3.430009127 2.523327112 



1.191825747 SER 2 
1.071671009 SER 2 
1.202566266 SER 2 



2.457015276 SER 2 
2.580434084 SER 2 
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HO H 0.310 

] BOND INFORMATION 
! rigid unit 0 

0 12-1-1! nding of incoming bond for unit 0 and nn 
3.783586502 3.069634438 -0.000003353 'beginning of bond 

for unit 0 

1 ! bonds out from rigid unit 0 
1 !unit 0 is bonded to unit 1 

0 12-1-1 ! beginning of outgoing bond and nn 
3.711599350 3.433972597 2.457015276 lending of outgoing 

bond for unit 0 
! rigid unit 1 

0 1 -1 -1 -1 lending of incoming bond for unit 1 and nn 

3.203660250 3.871555328 1.191825747 'beginning of bond 
for unit 1 

0 ! bonds out from rigid unit 1 

DATA FILE FOR THREONINE - T.DA 



1 The side- chain structure file for Threonine 

3 1 rigid units in side-chain 
! ATOM INFORMATION 

! rigid unit 0 

2 'atoms in this rigid unit 

CB 3.220216751 3.864162445 1.226425409 THR 2 

CT C 0.170 

HB 3.504307270 3.322291374 2.154003382 THR 2 

HC H 0.082 

1 rigid unit 1 

2 ! atoms in this rigid unit 

OGl 1.B02008B67 3.940322876 1.161503792 THR 2 

OH 0 -0.550 

HGl 1.520381451 4.374082565 1.972538352 THR 2 

HO H 0.310 

! rigid unit 2 

4 ! atoms in this rigid unit 

CG2 3.680637360 5.331728935 1.361316323 THR 2 
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CT 


C 


-0.191 














HG21 


3 


.224400043 


5 


.832503796 


2 


.234619141 


THR 


2 


HC 


H 


0.065 














HG22 


4 


.774106026 


5 


.420624733 


1 


.502453089 


THR 


2 


HC 


H 


0.065 














H623 


3 


.418393373 


5 


.928008556 


0 


.466874599 


THR 


2 


HC 


H 


0.065 















! BOND INFORWATION 
! rigid unit 0 

0 1 -1 -1 -1 lending of incoming bond and nn 

3.783586502 3,069634438 -0.000003353 Ibeginning of bond 
2 ! bonds out 

1 !unit 0 is bonded 

01-1-1-1 ] beginning of outgoing bond and nn 

1.802008867 3.940322876 1.161503792 lending of outgoing 

bond for unit 0 

2 lunit 0 is bonded 

01-1-1-1 ! beginning of outgoing bond and nn 
3,680637360 5.331728935 1.361316323 lending of outgoing 
bond for unit 0 
! rigid unit 1 

0 1 -1 -1 -1 lending of incoming bond and nn 

3.220216751 3.864162445 1.226425409 'beginning of bond 
for unit 1 
0 * bonds out 
! rigid unit 2 

0 12 3-1 lending of incoming bond and nn 

3.220216751 3.864162445 1.226425409 Ibeginning of bond 
for unit 1 

0 ! bonds out 

DATA FILE FOR VALINE - V.DAT 

1 The side-chain structure file for Valine 

3 ! rigid units in side-chain 
! ATOM INFORMATION 

! rigid unit 0 
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2 1 atoms in this rigid unit 

CB 3,211601496 3.852613449 1.247815728 VAL 2 

Ct C -0.012 

HB 3.447319269 3.248452187 2.150032282 VAL 2 

HC HO. 024 

I rigid unit 1 

4 J atoms in this rigid unit 



CGI 


1.676198244 


4 


.045934200 


J. 




VAij 


-5 


CT 


C -0.091 














HGll 


1.3519961B3 


4 


. 697401524 




. ^o44 J7 J U o J 


VAJj 


2 


HC 


H 0.031 














HG12 


1.142809749 


3 


.084587097 


1 


, lUbV / J J7 b 


TTTV T 

VAij 


2 


HC 


H 0.031 














HG13 


1.300095797 


4 


.498250008 


2 


.155061245 


VAL 


2 


HC 


H 0.031 














! rigid unit 2 














4 ! atoms in this rigid 


unit 










CG2 


3 . 797980547 


5 


.269292355 


1 


.500991821 


VAL 


2 


CT 


C -0.091 














HG21 


3.634918213 


5 


.953960419 


0 


.647068620 


VAL 


2 


HC 


H 0.031 














HG22 


3 .359194279 


5 


.751780510 


2, 


.395626068 


VAL 


2 


HC 


H 0.031 














HG23 


4 .886912346 


5. 


.247161865 


1. 


.696415067 


VAL 


2 



HC H 0.031 

1 BOND INFORMATION 
! rigid unit 0 

0 1 -1 -1 -1 lending of incoming bond and nn 

3.783586502 3.069634438 -0.000003354 [beginning of bond 

2 ! bonds out 

1 lunit bonded to 

01-1-1-1 ! beginning of outgoing bond and nn 

1.676198244 4.045934200 1 . 217347741 ! ending of outgoing 

bond 

2 !unit bonded to 

01-1-1-1 ! beginning of outgoing bond and nn 

3.797980547 5,269292355 1 . 500991821 ! ending of outgoing 
bond 

I rigid unit 1 
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0 12 3-1 lending of incoming bond and nn 

3.211601496 3.8S2613449 1.247815728 ! beginning of outgoing 
bond 

0 I bonds out 

1 rigid unit 2 

0 12 3-1 lending of incoming bond and nn 

3.211601496 3.852613449 1.247815728 'beginning of outgoing 
bond 

0 ! bonds out 

DATA FILE FOR TRYPTOPHAN - W.DAT 



! The side-chain structure file for Tryptophan 

2 ! rigid units in side -chain 
I ATOM INFORMATION 

! rigid unit 0 

3 ! atoms in this rigid unit 



CB 


3 


.247885227 


3 


.809360981 


1. 


.256884575 


TRP 


2 


CT 


C 


-0.098 














HBl 


3 


.555066347 


3 


.270197153 


2. 


.175767183 


TRP 


2 


HC 


H 


0.038 














HB2 


3 


.728011608 


4 


.802421093 


1. 


.350249052 


TRP 


2 


HC 


H 


0.038 














! rigid unit 1 














15 


.'atoms in this rigid unit 










CG 


1 


.731538415 


4 


.025276661 


1. 


.276940465 


TRP 


2 


C* 


C 


-0.135 














CDl 


0 


.792832434 


3 


.205200195 


1. 


936712861 


TRP 


2 


CW 


C 


0.044 














NEl 


-0 


.527979255 


3 


.628766537 


1. 


692452073 


TRP 


2 


NA 


N 


-0.352 














CE2 


-0 


.376119167 


4 


.727549076 


0. 


861387193 


TRP 


2 


CN 


C 


0.154 














CD2 


0 


.994750261 


4 


.975831032 


0. 


602216363 


TRP 


2 


CB 


c 


0.146 














KDl 


1 


.058894038 


2 


.330861330 


2. 


516448259 


TRP 


2 


HC 


H 


0.093 














HEl 


-1 


.402328849 


3 


.197247982 


2. 


011827707 


TRP 


2 
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H 


H 


0 .271 














CE3 


1 


.387488961 


6 . 


039774895 


-0 


.250452638 


TRP 


2 


CA 


c 


-0.173 














HE3 


2 


.430646658 


6. 


226261139 


-0 


.463923573 


TRP 


2 


HC 


H 


0.086 














C23 


D 


.392907262 


6. 


841813087 


-0 


. 810243368 


TRP 


2 


CA 


C 


-0.066 














HZ3 


0 


.674497783 


7. 


661212444 


-1 


.455789328 


TRP 


2 


HC 


H 


0.057 














CH2 


-0 


.963685811 


6. 


602497578 


*0 


.548699141 


TRP 


2 


CA 


c 


-0.077 














HH2 


-1 


.710847259 


7. 


243553162 


-0 


.992942095 


TRP 


2 


HC 


H 


0 . 074 
















- ± 




o . 




n 
U 


, Zl 1 d4^ JIU 


TRP 




CA 


c 


-0.168 














HZ2 




.410887718 


5 . 


363564491 


0 


.470484644 


TRP 


2 


HC 


H 


0.084 















! BOND INFORMATION 
! rigid unit 0 

0 12-1-1 lending of incoming bond and nn 

3.783586740 3.0S9634914 -0.000003497 .'beginning of bond 

1 ! bonds out 

1 !unit 0 is bonded 

0 12-1-1 ] beginning of outgoing bond and nn 

1.731538415 4.025276661 1 . 276940465 ! ending of outgoing 

bond for unit 0 
! rigid unit 1 

0 14-1-1 lending of incoming bond and nn 

3.247885227 3.809360981 1.256884575 ibeginning of bond 
for unit 1 
0 ! bonds out 

DATA FILE FOR TYROSINE - Y.DAT 

! The side-chain structure fil for Tyrosine 
3 ! rigid units in side -chain 
! ATOM INFORMATION 
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! rigid unit 0 

3 I atoms in this rigid unit 



CB 


3 


.293353796 


3 


.842515945 


1 


.259159327 


TYR 


2 


CT 


C 


-0.098 














HBl 


3 


.703839302 


3 


.358918667 


2 


.169649363 


TYR 


2 


HC 


H 


0.038 














HB2 


3 


.749134064 


4 


.852351665 


1 


.277104497 


TYR 


2 


HC 


H 


0.038 














! rigid unit 1 














10 ! 


! atoms in this rigid unit 










CG 


1 


.778211594 


4 


.019127369 


1 


.411828637 


TYR 


2 


CA 


C 


-0.030 














ODl 


1 


.068759203 


3 


.196300983 


2, 


.292453527 


TYR 


2 


CA 


c 


-0.002 














HDl 


1 


.585003138 


2 


.435774803 


2, 


.862824917 


TYR 


2 


HC 


H 


0.064 














CD2 


1 


.095163584 


4 


.989490032 


0. 


,672801077 


TYR 


2 


CA 


C 


-0.002 














HD2 


1 


.629922271 


5 


.630218983 


-0. 


014210327 


TYR 


2 


HC 


H 


0.064 














CEl 


-0, 


,309100747 


3 , 


.338460445 


2. 


427857637 


TYR 


2 


CA 


C 


-0.264 














HEl 


-0. 


.845880806 


2. 


691843510 


3. 


105883360 


TYR 


2 


HC 


H 


0.102 














CZ 


-0, 


,983952701 


4 . 


.304777145 


1. 


686211467 


TYR 


2 


C 


C 


0.462 














CE2 


-0. 


283983082 


5. 


129064560 


0. 


809688389 


TYR 


2 


CA 


C 


-0.264 














HE2 


-0. 


814125061 


5. 


873366833 


0. 


234044328 


TYR 


2 


HC 


H 


0.102 















! rigid unit 1 

2 "atoms in this rigid unit 

OH -2.337103367 4.443373203 1.815491915 TYR 2 

OH 0 -0.528 

HH -2.648404837 3.798558235 2.453088284 TYR 2 

HO H 0.334 

! BOND INFORMATION 
! rigid unit 0 

0 12-1-1 lending of incoming bond and nn 
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3.783586264 3.069634914 -0.000003354 Ibegiiining of bond 

1 'bonds out 

1 I unit bonded to 

0 12-1-1 ! beginning of outgoing bond and nn 
1.778211594 4.019127369 1 . 411828637 ! ending of outgoing 

bond for unit 0 

1 rigid unit 1 

0 13-1-1 lending of incoming bond and nn 

3.293353796 3.842515945 1.259159327 'beginning of bond 
for unit 1 

1 ! bonds out 

2 !unit bonded to 

7 5 8-1-1 ! beginning of outgoing bond and nn 
-2,337103367 4.443373203 1.815491915 lending of outgoing 
bond for unit 0 
I rigid unit 2 

0 1 -1 -1 -1 lending of incoming bond and nn 

-0.983952701 4.304777145 1.686211467 'beginning of bond 
for unit 1 
0 'bonds out 

DATA FILE FOR INITIAL PROTOTYPE - CX6C.CAR 

IBIOSYM archive 3 
PBC=OFF 

!DATE Thu Mar 2 10:02:29 1995 



SG 0.051616628 8.775964550 2,653307337 CYSn 1 

S S 0.824 

LGl -0,116704460 8.906803991 3.732450018 CYSn 1 

LP L -0.405 

LG2 -0.816371929 8.216369655 2.274560255 CYSn 1 

LP L -0,405 

CB 1.625257994 7.970290997 2.280061368 CYSn 1 

CT C -0.098 

HBl 1.743097230 7.117856362 2.972980432 CYSn 1 

HC H 0.050 

HB2 2,457560406 8.667686711 2.5Q6611212 CYSn 1 
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HC H 0.050 

CA 1.664891168 7.503978115 

CT C 0.035 

HA 2.715618613 7.453348875 

HC H 0.032 

N 0.954382540 8.512673633 

NT N -0.463 

C 1.063568189 6.132700222 

C C 0.616 

0 0.248707622 5.654726837 

0 0 -0.504 

N 1.449902196 5.479885680 

N N -0.463 

HN 2.157106102 5.992384244 

H H 0.252 

CA 0.868490592 4.154014497 

CT C 0.035 

HAl 1.550908149 3.403064022 

HC H 0.032 

HA2 -0.097660558 4.132736815 

HC H 0.032 

C 0.730531165 3.827591429 

C C 0.616 

0 1.559375145 4.206208097 

0 0 -0.504 

N -0.320742949 3.103195380 

N N -0.463 

HN -0.976177839 2.817016114 

H H 0.252 

CA -0.454134161 2.787581074 

CT C 0.035 

HAl -0.907422830 1.783240810 

HC H 0.032 

HA2 -1.127648566 3.540414569 

HC H 0.032 

C 0.896974016 2.736484179 

C C 0.616 

0 1.315189212 1.712629073 

0 0 -0.504 
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0.811322158 CYSn 1 

0.469159517 CYSn 1 

0.003030230 CYSn 1 

0.616111991 CYSn 1 

1.414398016 CYSn 1 

-0.464156147 GLY 2 

-1.099457509 GLY 2 

-0.652902307 GLY 2 

-0.212395307 GLY 2 

-0.116611463 GLY 2 

-2.120728786 GLY 2 

-2.957020570 GLY 2 

-2.456098946 GLY 3 

-1.646836012 GLY 3 

-3.875321662 GLY 3 

-3.972773051 GLY 3 

-4.323795441 GLY 3 

-4.547627543 GLY 3 

-5.101282348 GLY 3 
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N 1.599575272 3.853622667 

N N -0.463 

HN 1.137216234 4.691535216 

H H 0.252 

CA 2.905944550 3.8C4217731 

CT C 0.035 

HAl 3.056204584 2.789614618 

HC H 0.032 

HA2 2.897891721 4.540755026 

HC H 0.032 

C 4.014980067 4.050747291 

C C 0.616 

0 4.978871195 4.780583329 

0 0 -0.504 

N 3.887759074 3.450944950 

N N -0.463 

HN 3.003276191 2.844372268 

H H 0.252 

CA 4.960071382 3.689311240 

CT C 0.035 

HAl 5.709592998 2.881830301 

HC H 0.032 

HA2 5.427393718 4.658369322 

HC H 0.032 

C 4.437174470 3.643619035 

C C 0.616 

0 3.798322352 2.676595378 

0 0 -0.504 

N 4,713663113 4.691871185 

N N -0.463 

HN 5.286002166 5.476492875 

H H 0.252 

CA 4.208080753 4.647691975 

CT C 0.035 

HAl 3.303800182 4.010943092 

HC H 0.032 

HA2 4.993057374 4.194323221 

HC H 0.032 

C 3.799265981 6.023038258 
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-4.520184621 GLY 4 

-4.019658253 GLY 4 

-5.170228610 GLY 4 

-5.584558431 GLY 4 

-5.994216851 GLY 4 

-4.175561433 GLY 4 

-4.436272241 GLY 4 

-3.006608050 GLY 5 

-2.879487738 GLY 5 

-2.044877031 GLY 5 

-2.144167698 GLY 5 

-2.297948016 GLY 5 

-0.629041435 GLY 5 

-0.197242766 GLY 5 

0.124033264 GLY 6 

-0.348403798 GLY 6 

1.492986659 GLY 6 

1.515218779 GLY 6 

2.125265975 GLY 6 

1.963510280 GLY 6 
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C C 0.616 

O 4.006824522 7.036283245 

0 0 -0.504 

N 3.195690211 6.077750863 

N N -0-463 

HN 3.055107813 5.133307510 

H H 0.252 

CA 2.800412417 7.407555656 

CT C 0.035 

HAl 1.946687677 7.303619509 

HC H 0.032 

HA2 3.660862081 7,847316876 

HC H 0.032 

C 2.334578164 8.258959996 

C C 0.616 

0 2.337411236 9.494643783 

0 0 -0.504 

N 1.936206121 7.605756209 

N N -0.463 

HN 1.983632457 6.528240768 

H H 0.252 

CA 1.485796919 8.428968216 

CT C 0.035 

HA 0.399931102 8.271042216 

HC H 0-032 

C 2.167493478 8.018162291 

C C 0.616 

CB 1.746659419 9.902481747 

CT C -0.098 

HBl 2.709270705 10.016688002 

HC H 0.050 

HB2 1.816139488 10.541353385 

HC H 0.050 

SG 0.440719361 10.532225816 

S S 0.824 

LGl -0.404239097 10.957145937 

LP L -0.405 

LG2 0.793091788 11.329491558 

LP L -0.405 
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1.285298717 GLY 6 
3.136158080 GLY 7 
3.640799839 GLY 7 
3.591101372 GLY 7 
4.286815466 GLY 7 
4.127520148 GLY 7 
2.434291753 GLY 7 
2.487154063 GLY 7 
1.358640986 CYSN 8 
1.414418956 CYSN 8 
0.240136508 CYSN 8 
0.100059529 CYSN 8 
1.043072620 CYSN 8 
0.610166221 CYSN 8 
1.140264476 CYSN 8 
0.293951287 CYSN 6 
1.688457720 CYSN 8 
1.126774557 CYSN 8 
2.359427872 CYSN 8 
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end 
end 



PCTAJS96/04229 



END OF LISTING 

DATA FILE WEINER FORCES • AMBER, FRC 

IBIOSYM forcefield 2 
#version amber. frc 1.0 19-Oct-90 
#version atnber.frc 1.1 8 -Aug- 92 
#define aniber 

> This is the new format version of the amber forcefield 
!Ver Ref Function Label 



1 


.0 


1 


atotn^types 


amber 


1 


.0 


1 


equivalence 


amber 


1 


.0 


1 


hbond_de f i ni t ion 


amber 


1 


,0 


1 


quadratic_bond 


amber 


1 


.0 


1 


quadra t i c_angl e 


amber 


1 


.0 


1 


torsion_3 


amber 


1 


.0 


1 


out_of_plane 


amber 


1 


.0 


1 


nonbond(12-6) 


amber 


1 


.0 


1 


hydrogen_bond ( 10 - 12 ) 


amber 



#atom_types amber 

> Atom type definitions for any variant of amber 

> Masses from CRC 1973/74 pages B-250. 

!Ver Ref Type Mass Element Comment 



1.0 1 C 12.000000 C Kollman's Field: Masses 

from CRC 1973/74 pages B-250. 
1.0 1 C* 12.000000 C 
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1.0 


1 


C2 


12.000000 


C 


1.0 


3 


C3 


15.000000 


C 


1.0 


1 


CA 


12.000000 


C 


1.0 


1 


CB 


12.000000 


C 


1.0 


1 


CC 


12.000000 


C 


1.0 


3 


CD 


13.000000 


C 


1.0 


3 


CB 


13.000000 


C 


1.0 


3 


CF 


13.000000 


C 


1.0 


3 


CG 


13.000000 


C 


1.0 


3 


CH 


13.000000 


C 


1.0 


3 


CI 


13.000000 


C 


1.0 


3 


CJ 


13.000000 


C 


1.0 


1 


CK 


12.000000 


C 


1.0 


1 


CM 


12.000000 


c 


1.0 


1 


CN 


12.000000 


c 


1.0 


3 


CP 


13.000000 


c 


1.0 


1 


CQ 


12.000000 


c 


1.0 


1 


CR 


12.000000 


c 


1.0 


1 


CT 


12.000000 


c 


1.0 


1 


CV 


12.000000 


c 


1.0 


1 


CW 


12.000000 


c 


1.0 


1 


H 


1.007825 


H 


1.0 


1 


H2 


1.007825 


H 


1.0 


1 


H3 


1.007825 


H 


1.0 


1 


HC 


1.007825 


H 


1.0 


1 


HO 


1.007825 


H 


1.0 


1 


HS 


1.007825 


H 


1.0 


3 


LP 


3.000000 


H 


1.0 


1 


N 


14.003070 


N 


1.0 


1 


N* 


14.003070 


N 


1.0 


1 


N2 


14.003070 


N 


1.0 


1 


N3 


14.003070 


N 


1.0 


1 


NA 


14.003070 


N 


1.0 


1 


NB 


14.003070 


N 


1.0 


1 


NC 


14.003070 


N 


1.0 


1 


NP 


14.003070 


N 


1.0 


1 


NT 


14.003070 


N 


1.0 


1 


0 


15.994910 


0 


1.0 


1 


02 


15.994910 


0 
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PCTAUS96/04229 


1.0 


1 


OH 


15 .994910 


0 




1.0 


1 


OS 


15.994910 


0 




1.0 


1 


P 


30.993760 


P 




1.0 


1 


S 


31.972070 


S 




1.0 


1 


SH 


31.972070 


S 




1.0 


3 


CO 


40.080000 


Ca 




1.0 


3 


HW 


1.008000 


H 




1.0 


3 


IN 


35.450000 


CI 




1.0 


3 


CU 


63.550000 


Cu 




1.0 


3 


I 


22.990000 


I 




1.0 


3 


M6 


24.305000 


Mg 




1.0 


3 


OW 


16.000000 


0 




1.0 


3 


QC 


132 .90000 


Cs 




1.0 


3 


QK 


39.100000 


K 




1.0 


3 


QL 


6.940000 


Li 




1.0 


3 


QN 


22.990000 


Na 




1.0 


3 


OR 


85.470000 


Rb 




1.1 


4 


CS 


12.000000 


C 


carbohydrate sp3 carbon 


1.1 


4 


AC 


12.000000 


C 


carbohydrate alpha-anomeric 


carbon 












1.1 


4 


BC 


12.000000 


c 


carbohydrate beta-anomeric 


carbon 












1.1 


4 


HT 


1.007825 


H 


carbohydrate sp3 hydro 


1.1 


4 


AH 


1.007825 


H 


carbohydrate alpha-anomeric 


hydrogen 










1.1 


4 


BH 


1.007825 


H 


carbohydrate beta-anomeric 


hydrogen 










1.1 


4 


HY 


1.007825 


H 


carbohydrate hydroxyl 


hydrogen 










1.1 


4 


OT 


15.994910 


0 


carbohydrate hydroxyl 


oxygen 












1.1 


4 


OA 


15.994910 


0 


carbohydrate alpha-anomeric 


oxygen 












1.1 


4 


OB 


15.994910 


0 


carbohydrate beta-anomeric 


oxygen 












1.1 


4 


OE 


15.994910 


0 


carbohydrate ring oxygen 


1.0 


1 


h$ 


1.007825 


H 


Hydrogen atom for aTOMATIC 


PARAMETER assignment 






1.0 


1 


c$ 


12 . 000000 


C 


Carbon atom for automatic 
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parameter assignment 

1.0 1 n$ 14.003070 N 
parameter assignment 

1.0 1 o$ 15.994910 0 
parameter assignment 

1.0 1 S$ 31.972070 S 
parameter assignment 

1.0 1 p$ 30.993760 

automatic parameter assignment 
#equivalence amber 

> Equivalence table for any variant of amber 

Equivalences 



PCT/US96/04229 
Nitrogen atom for automatic 

Oxyg n atom for automatic 

Sulfur atom for automatic 

Phosphorous atom for 



Ver 

_ _ _ ^ 


Ref 

_ _ _ 


Type 


NonB 


Bond 


Angle 


Torsion 


001 


1.0 


1 


C 


_____ 

C 


^ _ _ _ 

C 


C 


C 


C 


1.0 


1 


C* 


C* 


C* 


C* 


c* 


C* 


1.0 


1 


C2 


C2 


C2 


C2 


C2 


C2 


1.0 


1 


C3 


C3 


C3 


C3 


C3 


C3 


1.0 


1 


CA 


CA 


CA 


CA 


CA 


CA 


1.0 


1 


CB 


CB 


CB 


CB 


CB 


CB 


1.0 


1 


CC 


CC 


CC 


CC 


CC 


CC 


1.0 


1 


CD 


CD 


CD 


CD 


CD 


CD 


1.0 


1 


CE 


CE 


CE 


CE 


CE 


CE 


1.0 


1 


CF 


CF 


CF 


CF 


CF 


CF 


1.0 


1 


CG 


CG 


CG 


CG 


CG 


CG 


1.0 


1 


CH 


CH 


CH 


CH 


CH 


CH 


1.0 


1 


CI 


CI 


CI 


CI 


CI 


CI 


1.0 


1 


CJ 


CJ 


CJ 


CJ 


CJ 


CJ 


1.0 


1 


CK 


CK 


CK 


CK 


CK 


CK 


1.0 


1 


CM 


CM 


CM 


CM 


CM 


CM 


1.0 


1 


CN 


CN 


CN 


CN 


CN 


CN 


1.0 


1 


CP 


CP 


CP 


CP 


CP 


CP 


1.0 


1 


CQ 


CQ 


CO 


CQ 


CQ 


CQ 


1.0 


1 


CR 


CR 


CR 


CR 


CR 


CR 


1.0 


1 


CT 


CT 


CT 


CT 


CT 


CT 


1.0 


1 


CV 


CV 


CV 


CV 


CV 


CV 


1.0 


1 


CW 


CW 


CW 


CW 


CW 


CW 


1.0 


1 


H 


H 


H 


H 


H 


H 
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1.0 1 


H2 


H2 


H2 


H2 


H2 


PCT/US96/(m2z9 
H2 


1 , 0 


X 


H3 


H3 


H3 


H3 


H3 


H3 


1 . U 


X 


HC 


HC 


HC 


HC 


HC 


HC 


1 f\ 

X . u 


•1 
X 


HO 


HO 


HO 


HO 


HO 


HO 


1 A 
X • U 


X 


HS 


HS 


HS 


HS 


HS 


HS 


1 n 
i. * u 


X 


LP 


LP 


LP 


LP 


LP 


LP 


1 n 
X • u 


X 


N 


N 


N 


N 


N 


N 


i. • u 


X 


N* 


N* 


N* 


N* 


N* 


N* 


J. • u 


X 


N2 


N2 


N2 


N2 


N2 


N2 


1 n 


X 


N3 


N3 


N3 


N3 


N3 


N3 


X • u 


X 


NA 


NA 


NA 


NA 


NA 


NA 


1 A 

X . u 


X 


NB 


NB 


NB 


NB 


NB 


NB 


1 A 

X . u 


X 


NC 


NC 


NC 


NC 


NC 


NC 


T A 
X . U 


X 


NP 


NP 


NP 


NP 


NP 


NP 


1 A 
X • U 


X 


NT 


NT 


NT 


NT 


NT 


NT 


1 A 
X • U 


X 


0 


0 


0 


0 


0 


0 


1 A 
X . U 


X 


02 


02 


02 


02 


02 


02 


T A 
X . U 


X 


OH 


OH 


OH 


OH 


OH 


OH 


1 A 
X . U 


X 


OS 


OS 


OS 


OS 


OS 


OS 


T A 
X . U 


X 


P 


P 


P 


P 


P 


P 


1 A 
X . U 


X 


S 


S 


S 


S 


S 


S 


T A 
X . U 


X 


SH 


SH 


SH 


SH 


SH 


SH 


T A 

X . u 




I 


I 


I 


I 


I 


I 


T A 
X . V 


■a 


cu 


CU 


CU 


CU 


CU 


CU 


T A 
X • U 




IM 


IM 


IM 


IM 


IM 


IM 


1 A 
X , U 




CO 


CO 


CO 


CO 


CO 


CO 


1 A 
X . U 




HW 


HW 


HW 


HW 


HW 


HW 


1 A 
X « w 




MG 


MG 


MG 


MG 


MG 


MG 


1 A 




OW 


OW 


OW 


OW 


OW 


OW 






QC 


QC 


QC 


QC 


QC 


QC 


1 A 


-a 


QK 


QK 


QK 


QK 


QK 


QK 


T A 




QL 


QL 


QL 


QL 


QL 


QL 




J 


QN 


QN 


QN 


QN 


QN 


QN 


1 n 




QR 


QR 


QR 


QR 


QR 


QR 


1 1 




cs 


CS 


CS 


CS 


CS 


CS 


1.1 


4 


AC 


AC 


AC 


AC 


AC 


AC 


1.1 


4 


BC 


BC 


BC 


BC 


BC 


BC 


1.1 


4 


HT 


HT 


HT 


HT 


HT 


HT 


1.1 


4 


AH 


AH 


AH 


AH 


AH 


AH 
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1 . 1 


4 


BH 


BH 


BH 


1 . 1 


4 


HY 


HY 


HY 


1 , 1 


4 


OT 


OT 


OT 


1 . 1 


4 


OA 


OA 


OA 


1 . 1 


4 


OB 


OB 


OB 


1 . 1 


4 


Ob 


Ob 


Ob 


1 . 0 


1 






n$ 


1 . 0 


1 






c? 


1 • 0 


1 


n$ 


n$ 


n$ 


1.0 


1 


o$ 


o$ 


o$ 


1.0 


1 


s$ 


s$ 


s$ 


1.0 


1 


p$ 


p$ 





#hbond_def init ion amber 
1.0 1 distance 2,5000 
1.0 1 angle 90.0000 
1.0 1 donors H HO H2 

1,0 1 acceptors NB NC 02 

#quadratic_bond amber 



. E = 


K2 * 


(R - 


R0)*2 




Ver 


Ref 


I 


J 


RO 


1.0 


3 


OW 


HW 


0.9572 


1.0 


3 


HW 


HW 


1.5136 


1.0 


3 


CH 


N3 


1.471 


1.0 


3 


C3 


SH 


1.810 


1.0 


1 


C 


C2 


1.5220 


1.0 


1 


C 


C3 


1.5220 


1.0 


1 


C 


CA 


1.4000 


1.0 


1 


C 


CB 


1.4190 


1.0 


1 


C 


CD 


1.4000 


1.0 


1 


c 


CH 


1.5220 


1.0 


1 


c 


CJ 


1.4440 


1.0 


1 


c 


CM 


1.4440 


1.0 


3 


c 


CT 


1.5220 


1.0 


1 


c 


N 


1.3350 


1.0 


1 


c 


N* 


1.3830 


1.0 


1 


c 


NA 


1.3880 


1.0 


1 


c 


NC 


1.3580 


1.0 


1 


c 


0 


1.2290 



BH BH BH 

HY HY HY 

OT OT OT 

OA OA OA 

OB OB OB 

OE OE OE 

h$ h$ h$ 

c$ c$ c$ 

n$ n$ n$ 

o$ o$ o$ 

s$ s$ s$ 

P$ p$ p$ 



H3 HS 

O OH S SH 



K2 



553.0000 
553.0000 
367.0000 
222.0000 
317.0000 
317.0000 
469.0000 
447.0000 
469.0000 
317.0000 
410.0000 
410.0000 
317.0000 
490.0000 
424.0000 
418.0000 
457.0000 
570.0000 
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1.0 


1 


c 


02 


1.2500 


1.0 


1 


c 


OH 


1.3640 


1.0 


1 


c* 


C2 


1.4950 


1.0 


1 


c* 


CB 


1.4590 


1.0 


1 


c* 


CG 


1.3520 


1.0 


1 


c* 


CT 


1.4950 


1.0 


1 


c* 


CW 


1.3520 


1.0 


1 


c* 


HC 


1.0800 


1.0 


1 


C2 


C2 


1 .5250 


1.0 


1 


C2 


C3 


1.5260 


1.0 


1 


C2 


CA 


1.5100 


1.0 


1 


C2 


CC 


1.5040 


1.0 


1 


C2 


CH 


1.5260 


1.0 


1 


C2 


N 


1.4490 


1.0 


1 


C2 


N2 


1.4630 


1.0 


1 


C2 


N3 


1.4710 


1.0 


1 


C2 


NT 


1.4710 


1.0 


1 


C2 


OH 


1.4250 


1.0 


1 


C2 


OS 


1.4250 


1.0 


1 


C2 


S 


1 . 8100 


1.0 


1 


C2 


SH 


1.8100 


1.0 


1 


C3 


CH 


1.5260 


1.0 


1 


C3 


CM 


1.5100 


1.0 


1 


C3 


N 


1.4490 


1.0 


1 


C3 


N* 


1.4750 


1.0 


1 


C3 


N2 


1 .4630 


1.0 


1 


C3 


N3 


1.4710 


1.0 


1 


C3 


OH 


1.4250 


1.0 


1 


C3 


OS 


1.4250 


1.0 


1 


C3 


S 


1.8100 


1.0 


1 


CA 


CA 


1.4000 


1.0 


1 


CA 


CB 


1 .4040 


1.0 


1 


CA 


CD 


1.4000 


1.0 


1 


CA 


CJ 


1.4330 


1.0 


1 


CA 


CM 


1.4330 


1.0 


1 


CA 


CN 


1.4000 


1.0 


1 


CA 


CT 


1.5100 


1.0 


1 


CA 


HC 


1.0800 


1.0 


1 


CA 


N2 


1.3400 
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656.0000 
450.0000 
317.0000 
38B.0000 
546.0000 
317.0000 
546.0000 
340. 0000 
260.0000 
260.0000 
317.0000 
317.0000 
260.0000 
337.0000 
337.0000 
367.0000 
367.0000 
386.0000 
320.0000 
222.0000 
222.0000 
260.0000 
317.0000 
337.0000 
337.0000 
337.0000 
367.0000 
386. 0000 
320.0000 
222.0000 
469.0000 
469.0000 
469.0000 
427.0000 
427.0000 
469.0000 
317.0000 
340.0000 
481.0000 
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1.0 


1 


CA 


NA 


1 . 3810 


1.0 


1 


CA 


NC 


1 .3390 


1.0 


1 


CB 


CB 


1 .3700 


1,0 


1 


CB 


CD 


1 .4000 


1.0 


1 


CB 


CN 


1 .4190 


1.0 


1 


CB 


N* 


1 .3740 


1 . 0 


1 


CB 


NB 


1.3910 


1 . 0 


1 


CB 


NC 


1 . 3540 


1 . 0 


1 


CC 


CF 


1 .3750 


1 . 0 


1 


CC 


CG 


1 .3710 


1 . 0 


1 


CC 


CT 


1 . 5040 


1 , 0 


1 


CC 


cv 


1 . 3750 


1 . 0 


1 


CC 


cw 


1 .3710 


1 . 0 


1 


CC 


NA 


1 .3850 


1 . 0 


1 


CC 


NB 


1 .3940 


1 . 0 


1 


CD 


CD 


1 . 4000 


1 . 0 


1 


CD 


CN 


1 .4000 


1 . 0 


1 


CE 


N* 


1 . 3710 


1 . 0 


1 


CE 


NB 


1 . 3040 


1 . 0 


1 


CF 


NB 


1 .3940 


1 . 0 


1 


CG 


NA 


1 . 3810 


1 . 0 


1 


CH 


CH 


1.5260 


1 . 0 


1 


CH 


N 


1 , 4490 


1 . 0 


1 


CH 


N* 


1 .4750 


1.0 


1 


CH 


NT 


1 .4710 


1.0 


1 


CH 


OH 


1 . 4250 


1.0 


1 


CH 


OS 


1 .4250 


1,0 


1 


CI 


NC 


1 .3240 


1,0 


1 


CJ 


CJ 


1 .3500 


1.0 


1 


CJ 


CM 


1.3500 


1.0 


1 


CJ 


N* 


1.3650 


1.0 


1 


CK 


HC 


1.0800 


1.0 


1 


CK 


N* 


1.3710 


1.0 


1 


CK 


NB 


1 .3040 


1.0 


1 


CM 


CM 


1.3500 


1 n 


X 






1 c 1 nn 
i. . 3 xuu 


1.0 


1 


CM 


HC 


1.0800 


1.0 


1 


CM 


N* 


1.3650 


1.0 


1 


CN • 


NA 


1.3800 
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427.0000 

483 . 0000 

520.0000 

469.0000 

447.0000 

436.0000 

414 . 0000 

461.0000 

512.0000 

518.0000 

317.0000 

512.0000 

518. 0000 

422.0000 

410. 0000 

469.0000 

469. 0000 

440. 0000 

529. 0000 

410. 0000 

427.0000 

260.0000 

337.0000 

337.0000 

367.0000 

366.0000 

320.0000 

502.0000 

549.0000 

549.0000 

448.0000 

340.0000 

440.0000 

529.0000 

549.0000 

317.0000 

340.0000 

448.0000 

428.0000 
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1.0 


1 


CP 


NA 


1.3430 


477.0000 


1.0 


1 


CP 


NB 


1.3350 


488.0000 


1.0 


1 


CQ 


HC 


1.0800 


340.0000 


1.0 


1 


CQ 


NC 


1.3240 


502.0000 


1,0 


1 


CR 


HC 


1.0800 


340.0000 


1.0 


1 


CR 


NA 


1.3430 


477.0000 


1.0 


1 


CR 


NB 


1.3350 


488. 0000 


1.0 


1 


CT 


CT 


1.5260 


310.0000 


1.0 


1 


CT 


HC 


1.0900 


331.0000 


1.0 


3 


CT 


N 


1.4490 


337. 0000 


1.0 


1 


CT 


N* 


1.4750 


337.0000 


1.0 


1 


CT 


N2 


1.4630 


337 . 0000 


1.0 


1 


CT 


N3 


1.4710 


367.0000 


1.0 


1 


CT 


OH 


1.4100 


320.0000 


1.0 


1 


CT 


OS 


1 .4100 


320.0000 


1.0 


1 


CT 


S 


1.8100 


222.0000 


1.0 


1 


CT 


SH 


1.8100 


222 . 0000 


1.0 


1 


CV 


HC 


1.0800 


340.0000 


1.0 
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1.0 1 * C CJ * 0.0000 0.0 

3.1000 180.0 0.0000 0.0 

1,0 1 * C CM ♦ 0.0000 0.0 

3.1000 180.0 0.0000 0.0 

1,0 1 * C CT * 0.0000 0.0 

0.0000 0,0 0.0000 0.0 

1.0 1 * C N * 0.0000 0.0 

10.0000 IBO.O 0.0000 0.0 

1.0 1 ♦ C N* ♦ 0.0000 0,0 

5.8000 180.0 0,0000 0.0 

1.0 1 * C NA ♦ 0.0000 0.0 

5.4000 180.0 0.0000 0.0 

1,0 1 ♦ C NC ♦ 0.0000 0.0 

8.0000 180.0 0.0000 0.0 

1.0 1 ♦ C OH * 0.0000 0.0 

1.8000 180.0 0.0000 0.0 

1.0 1 * C* C2 ♦ 0.0000 0.0 

0.0000 0.0 0.0000 0.0 

1.0 1 * C* CB ♦ 0.0000 0.0 

4.8000 180.0 0.0000 0.0 

1.0 1 ^ C* CG * 0.0000 0.0 

23.6000 180.0 0,0000 0.0 

1.0 1 ♦ C* CT ♦ 0,0000 0.0 

0.0000 0.0 0.0000 0.0 

1.0 1 * C* CW * 0.0000 0.0 

23,6000 180.0 0.0000 0.0 

1.0 1 ♦ C2 C2 ♦ 0,0000 0.0 

0.0000 0.0 2.0000 0.0 

1.0 1 * C2 CA * 0.0000 0.0 

0.0000 0.0 0.0000 0.0 

1.0 1 ♦ C2 CC ♦ 0.0000 0.0 

0.0000 0.0 0.0000 0,0 

1.0 1 ♦ C2 CH * 0.0000 0.0 

0.0000 0.0 2.0000 0.0 

1.0 1 * C2 N * 0.0000 0.0 

0.0000 0.0 0.0000 0.0 

1,0 1 * C2 N2 * 0.0000 0.0 

0.0000 0,0 0.0000 0.0 

1.0 1 ♦ C2 N3 ♦ 0.0000 0.0 
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0.0000 0.0 

1.0 1 * 
0.0000 0.0 

1.0 1 ♦ 
0.0000 0.0 

1.0 1 ♦ 
0.0000 0.0 

1.0 1 * 
0.0000 0.0 

1.0 1 * 
0.0000 0.0 

1.0 1 ♦ 

5.3000 180.0 

1.0 1 ♦ 
10.2000 180.0 

1.0 1 * 

5.3000 180.0 

1.0 1 * 

3.7000 180.0 

1.0 1 * 

3.7000 180.0 

1.0 1 ♦ 
10.6000 180.0 

1.0 1 * 
0.0000 0.0 

1.0 1 * 

6.8000 180.0 

1.0 1 * 

6.0000 180.0 

1.0 1 * 

9.6000 180.0 

1.0 1 ♦ 
16.3000 180.0 

1.0 1 * 
20.0000 180.0 

1.0 1 * 

6.6000 180.0 

1.0 1 * 

5.1000 180.0 



1.4000 0.0 
C2 NT * 

1.0000 0.0 
C2 OH * 

0.5000 0.0 
C2 OS * 

1.4500 0.0 
C2 S ♦ 

1.0000 0.0 
C2 SH * 

0.7500 0.0 
CA CA * 

0.0000 0.0 
CA CB * 
0.0000 0.0 
CA CD ♦ 

0.0000 0.0 
CA CJ ♦ 

0.0000 0.0 
CA CM * 

0.0000 0.0 
CA CN » 
0.0000 0.0 
CA CT * 

0.0000 0.0 
CA N2 * 

0.0000 0.0 
CA NA * 

0.0000 0.0 
CA NC * 

0.0000 0.0 
CB CB * 
0.0000 0.0 
CB CN * 
0.0000 0.0 
CB N* ♦ 

0.0000 0.0 
CB NB * 

0.0000 0.0 



0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 



0.0000 



0.0000 
0.0000 



0.0000 
0.0000 



0.0000 
0.0000 
0.0000 
0.0000 



0.0 



0.0000 0.0 



0.0 



0.0 



0.0000 0.0 



0.0 



0.0 



0.0000 0.0 



0.0 
0.0 

0.0 

0.0 
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1.0 3 * 

8.3000 180.0 

1.0 1 * 
14.3000 180.0 

1.0 1 ♦ 
15,9000 180.0 

1.0 1 * 

0.0000 0.0 

1,0 1 ♦ 
14.3000 180. 0 

1.0 1 * 
15.9000 180.0 

1.0 1 * 

5.6000 180.0 

1.0 1 * 

4.8000 180.0 

1.0 1 * 

5.3000 180.0 

1.0 1 ♦ 

5.3000 180.0 

1.0 1 ♦ 

6.7000 180.0 

1.0 1 ♦ 
20.0000 180.0 

1.0 1 ♦ 

4.8000 180.0 

1.0 1 ♦ 

6.0000 180.0 

1.0 1 * 

0.0000 0.0 

1,0 1 * 

0.0000 0.0 

1.0 1 ♦ 

0.0000 0.0 

1.0 1 * 

0.0000 0.0 

1.0 1 * 

0.0000 0.0 

1,0 1 * 



CB NC * 

0.0000 0.0 
CC CF * 
0.0000 0.0 
CC CG * 
0.0000 0.0 
CC CT * 

0.0000 0.0 
CC cv ♦ 
0.0000 0.0 
CC cw ♦ 
0.0000 0.0 
CC NA * 

0.0000 0.0 
CC NB * 

0.0000 0.0 
CD CD * 

0.0000 0.0 
CD CN ♦ 

O.OOOD 0.0 
CE N* * 

0.0000 0.0 
CE NB ♦ 
0.0000 0,0 
CF KB ♦ 

0.0000 0.0 
CG NA * 

0.0000 0.0 
CH CH ♦ 

2.0000 0.0 
CH N ♦ 

O.OOOO 0.0 
CH N* * 

0.0000 0.0 
CH NT * 

1.0000 0.0 
CH OH * 

0,5000 0.0 
CH OS * 
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0.0000 0,0 

0-0000 0.0 

0.0000 0.0 

0,0000 0.0 

0.0000 0.0 

0.0000 0,0 

0.0000 0.0 



0.0000 



0.0000 

0.0000 
0,0000 
0.0000 
0.0000 
0.0000 
0.0000 
0.0000 



0.0 



0.0000 0.0 



0.0000 0.0 



0.0 



0.0 



0.0 



0.0 



0.0 



0.0 



0.0 



0.0 



0.0000 0.0 



0.0000 0.0 
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0.0000 0.0 

1.0 1 ♦ 
13.5000 180.0 

1.0 1 * 
24.4000 180.0 

1.0 1 * 
24.4000 180.0 

1.0 1 * 
7.4000 180.0 

1.0 1 * 
6.7000 180.0 

1.0 1 ♦ 
20.0000 180.0 

1.0 1 * 
24.4000 180.0 

1.0 1 * 
0.0000 0.0 

1.0 1 ♦ 
7.4000 180.0 

1.0 1 ♦ 
12.2000 180.0 

1.0 1 ♦ 
9.3000 180.0 

1.0 1 * 
10.0000 180.0 

1.0 1 * 
13.5000 180.0 

1.0 1 * 
9.3000 180.0 

1.0 1 * 
10.0000 180.0 

1.0 1 * 
0.0000 0.0 

1.0 1 * 
0.0000 0.0 

1.0 1 * 
0.0000 0.0 

1.0 1 * 
0.0000 0.0 



1.4500 0.0 
CI NC * 
0.0000 0.0 
CJ CJ * 
0.0000 0.0 
CJ CM * 
0.0000 0.0 
CJ N* ♦ 

0.0000 0.0 
CK N* * 

0.0000 0.0 
CK NB ♦ 
0.0000 0.0 
CM CM * 
0.0000 0.0 
CM CT * 

0.0000 0.0 
CM N* * 

0.0000 0.0 
CN NA * 
0.0000 0.0 
CP NA ♦ 

0.0000 0.0 
CP NB * 
0.0000 0.0 
CQ NC * 
0.0000 0.0 
CR NA * 

0.0000 0.0 
CR NB * 
0.0000 0.0 
CT CT * 

1.3000 0.0 
CT N * 

0.0000 0.0 
CT N* * 

0.0000 0.0 
CT N2 * 

0.0000 0.0 



0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 



0.0000 
0.0000 
0.0000 



0.0000 
0.0000 
0.0000 



0.0000 

0.0000 
0.0000 
0.0000 



0.0 



0.0 



0.0 



0.0000 0.0 



0.0 



0.0 



0.0 



0.0000 0.0 



0.0 

0.0 
0.0 
0.0 



0.0000 0.0 



0.0000 0.0 
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1.0 1 •* 
0.0000 0.0 

1.0 1 * 
0.0000 0.0 

1.0 1 * 
0.0000 0.0 

1.0 1 * 
0.0000 0.0 

1.0 1 ♦ 
0.0000 0.0 

1.0 1 ♦ 
4.8000 180.0 

1.0 1 * 
6.0000 180.0 

1.0 1 * 
0.0000 0.0 

1.0 1 ♦ 
0.0000 0.0 

1.0 1 o 
0.0000 0.0 

1.0 1 0 
0.0000 0.0 

1.0 1 o 
0.0000 0.0 

1.0 1 0 
0.0000 0.0 

1.0 1 OS 
0.5000 0.0 

1.0 2 OH 
0.5000 0.0 

1.0 1 OS 
0.5000 0.0 

1.0 1 OS 
0.5000 0.0 

1.0 1 OS 
0.5000 0.0 

1.0 1 OH 
0.5000 0.0 

1.0 1 C2 



CT 
1 .4000 

CT 
0.5000 

CT 
1.1500 

CT 
1.0000 

CT 
0.7500 

CV 
0.0000 

CW 
0.0000 

OH 
0.7500 

OS 
0.7500 

C 

0.2000 
C 

0.1000 
C 

0.1000 
C 

0 .1000 

C2 
2.0000 

C2 
2 .0000 

C2 
2 .0000 

C2 
1.0000 

C2 
1.0000 

C2 
1.0000 

C2 



N3 ♦ 

0.0 
OH ♦ 

0.0 
OS ♦ 

0.0 
S * 

0.0 
SH * 

0.0 
NB * 

0.0 
NA ♦ 

0.0 
P * 

0.0 

p ♦ 

0.0 
C2 N 
180.0 
CH C2 
180.0 
CH N 
180.0 
CH CH 
180.0 
C2 OH 

0.0 
C2 OH 

0.0 
C2 OS 

0.0 
CH OS 

0.0 
CH OH 

0.0 
CH OH 

0.0 
S LP 
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0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 
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0.0000 

1.0 
0.0000 

1.0 
0.5000 

1.0 
0.5000 

1.0 
0.5000 

1.0 
0.5000 

1.0 
1 .7100 

1.0 
6.5900 

1.0 
6 .5900 

1.0 
6.5900 

1.0 
9.5100 

1.0 
1.7100 

1.0 
9.5100 

1.0 
6.5900 

1.0 
0.0000 

1.0 
0.0000 

1.0 
0.0000 

1.0 
0.2000 

1.0 
0.5000 

1.0 
0.5000 



0.0 
1 CH 

0.0 
1 OS 

0.0 
1 OH 

0.0 
1 OS 

0.0 
1 OS 

0.0 
1 HC 
IBO.O 
1 C 
180.0 
1 N* 
180.0 
1 CA 
180. 0 
1 N* 
180.0 
1 HC 
180.0 
1 N* 
180.0 
1 N* 
180.0 
1 N 

0.0 
1 HC 

0.0 
1 CT 

0.0 
1 CT 
180.0 
1 OS 

0.0 
1 OS 
0.0 



0.0000 

C2 
0.0000 

CH 
1 . 0000 

CH 
0.5000 

CH 
0.5000 

CH 
0.5000 

CM 
0.0000 

CM 
0.0000 

CM 
0 .0000 

CM 
0.0000 

CM 
0 .0000 

CM 
0.0000 

CM 
0.0000 

CM 
0 .0000 

CT 
0.0670 

CT 
0.0670 

CT 
0.0670 

OS 
0.3830 

CT 
0.1440 

CT 
0.1440 



0.0 
SH LP 

0.0 
C2 OH 

0.0 
CH OH 

0.0 
CH OH 

0.0 
CH OS 

0.0 
CM CT 

0.0 
CM HC 

0.0 
CM CT 

0.0 
CM HC 

0.0 
CM CA 

0.0 
CM HC 

0.0 
CM C 

0.0 
CM HC 

0.0 
C 0 
180.0 
C 0 
180.0 
C 0 
180.0 
CT CT 

0.0 
CT OS 

0.0 
CT OH 

0.0 



0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 
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1.0 1 OH CT CT OH 

0.5000 0.0 0.1440 0.0 

1.0 1 H N C 0 

2.5000 180.0 0.0000 0.0 

1.0 1 C2 OS C2 C3 

0.1000 0.0 0.7250 0.0 

1.0 1 C2 OS C2 C2 

0.1000 0.0 1.4500 0.0 

1.0 1 C3 OS C2 C3 

0.1000 0.0 1.4500 0.0 

1.0 1 CH OS CH C2 

0,1000 0.0 0.7250 0.0 

1,0 1 CH OS CH CH 

0.1000 0.0 0.7250 0.0 

1.0 1 C2 OS CH C2 

0.1000 0.0 0.7250 0.0 

1,0 1 C3 OS CH C3 

0.1000 0.0 0.7250 0.0 

1.0 1 CH OS CH N* 

0.0000 0.0 0.7250 0.0 

1 . 0 1 C2 OS CH C3 

0.1000 0.0 0.7250 0.0 

1.0 1 OH P OS C3 

0.7500 0.0 0.2500 0.0 

1.0 1 OS P OS C2 

0.7500 0.0 0.2500 0.0 

1.0 1 OH P OS C2 

0.7500 0.0 0.2500 0,0 

1.0 1 OS P OS CT 

0.7500 0.0 0.2500 0.0 

1.0 1 OS P OS CH 

0.7500 0.0 0.2500 0.0 

1.0 1 OS P OS C3 

0.7500 0.0 0.2500 0.0 

1.0 1 OH P OS CH 

0.7500 0.0 0.2500 0.0 

1.0 1 OH P OS CT 

0.7500 0.0 0,2500 0.0 

1.0 1 LP S S LP 
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0.0000 0.0 

0.6500 0.0 

0,0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0,0 

0.0000 0.0 

0,0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0,0 

0.0000 0,0 

0.0000 0.0 

0.0000 0.0 
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0.0000 0.0 
1.0 1 LP 

0.0000 0.0 
1.0 1 C2 

3.5000 0.0 
1.0 1 CT 

3.5000 0.0 

1.0 1 LP 
0.0000 0.0 

1.1 4 ♦ 
0.0000 0.0 

1.1 4 * 
0.0000 0.0 

1.1 4 ♦ 
0.0000 0.0 

1.1 4 * 
0.0000 0.0 

1.1 4 ♦ 
0.0000 0.0 

1.1 4 * 
0.0000 0.0 

1.14 ♦ 
0.0000 0.0 

1.1 4 * 
0.0000 0.0 

1.1 4 ♦ 
0.0000 0.0 

1.1 4 * 
0.0000 0.0 

1.1 4 ♦ 
0.0000 0.0 

1.14 * 
0.0000 0.0 

1.14 * 
0.0000 0.0 

1.1 4 ♦ 
10.0000 IBO.O 
1.14 * 
0.0000 0.0 
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0.0000 0.0 

S S C2 

0.0000 0.0 

S S C2 

0.6000 0.0 

S S CT 

0.6000 0.0 

S S CT 

0.0000 0.0 

cs cs ♦ 

1.0210 0.0 

CS CT ♦ 
1.0210 0.0 

AC CS ♦ 
1.0210 0.0 

BC CS * 
1.0210 0.0 

CS OT ♦ 
0.4430 0.0 

CS OE * 
0.9280 0.0 

AC OE * 
0.92B0 0.0 

BC OE * 
0.9260 0.0 

AC OA ♦ 
0.0000 0.0 

BC OB * 
0.0000 0.0 

CS OA * 
0.0000 0.0 

CS OB * 
0.0000 0.0 

CS N * 
0.0000 0.0 

C N * 

0.0000 0.0 

c cs * 

0-0000 0.0 



0 .0000 
0 .0000 
0 .0000 



0.0000 
0 .0000 
0.0000 



0.0000 
0.0000 
0.0000 
0.0000 
0.0000 
0.0000 
0.0000 
0.0000 
0.0000 



0.0 
0.0 
0.0 



0.0000 0.0 



0.0 
0.0 
0.0 



0.0000 0.0 



0.0 



0.0 



0.0 



0.0 



0.0 



0.0 



0.0 



0.0 



0.0 



0.0000 0.0 



0.0000 0.0 
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1.1 4 OE 
0.0000 0.0 

1.1 4 AH 
1.7500 60.0 

1.1 4 CS 
0.0000 0.0 

1.1 4 OE 
0.0000 0.0 

1.14 AH 
1.7500 60.0 

1.1 4 CS 
0.0000 0.0 

1.1 4 OE 
0.0000 0.0 

1.1 4 BH 
1.2500 240.0 

1.1 4 CS 
0.0000 0.0 

1.1 4 OE 
0.0000 0.0 

1.1 4 BH 
1.2500 240.0 

1.1 4 CS 
0.0000 0.0 

1.1 4 HT 
0.0000 0.0 

1.1 4 HT 
0.0000 0.0 

1.1 4 H 
2.5000 180.0 

1.1 4 HT 
0.0000 0.0 

1.0 1 $$ 
0.0000 0.0 

1.0 1 $$ 
5.3000 180.0 

1.0 1 $$ 
16.3000 180.0 

1.0 1 $$ 



AC OA CS 
0.0000 0.0 

AC OA CS 
0.0000 0.0 

AC OA CS 
0.8500 0.0 

AC OA HY 
0.0000 0.0 

AC OA HY 
0.0000 0.0 

AC OA HY 
0.8500 0.0 

EC OB CS 
0.0000 0.0 

EC OB CS 
0.0000 0.0 

BC OB CS 
1.4000 0.0 

BC OB HY 
0.0000 0.0 

BC OB HY 
0.0000 0.0 

BC OB HY 
1.4000 0.0 

AC OA CS 
0.8500 0.0 

BC OB CS 
1.4000 0.0 

N C 0 

0.0000 0.0 

CS C 0 

0.0670 180.0 

C$1 C$1 $$ 
1.3000 0.0 

C$2 C$2 $$ 
0.0000 0.0 

C$3 C$3 $$ 

0.0000 0.0 

C$5 C$5 $$ 
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2.1500 300.0 

0.0000 0.0 

0.0000 0.0 

2.1500 300.0 

0.0000 0.0 

0.0000 0.0 

■1.0500 0.0 



0.0000 
0.0000 
■1.0500 
0.0000 
0.0000 
0.0000 
0.0000 
0.6500 
0.0000 
0 .0000 



0.0 
0.0 
0.0 
0.0 
0.0 
0.0 
0.0 
0.0 
0.0 
0.0 



0.0000 0.0 

0.0000 0.0 
0.0000 0.0 
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0.0000 180.0 

1.0 1 $$ 
0.0000 0.0 

1.0 1 $$ 
0.0000 0.0 

1.0 1 $$ 
5.8000 180.0 

1.0 1 $$ 
10.0000 180.0 

1.0 1 $5 
0.0000 0.0 



0.0000 0.0 

CSl 051 $$ 

1.1000 0.0 

C$1 N$l S$ 

0,3000 0.0 

C$2 N$2 $$ 

0.0000 0.0 

C$3 N$3 $$ 
0.0000 0.0 

C$1 S$l $$ 

0.7500 0.0 



1.0 



5$ S$l SSI $$ 



3.5000 0.0 

1.0 1 $$ 
0.0000 0.0 

1.0 1 $5 
0.0000 0.0 

1.0 1 $$ 
0.0000 0.0 

1.0 1 $$ 
0.0000 0.0 



0.6000 0.0 

OSl 0$1 $$ 

1,1000 0.0 

0$1 N$l $$ 

1.1000 0.0 

0$1 P$l $$ 

0.7500 0.0 

NSl N$l $$ 

0.3000 0.0 



#out_of_plane amber 

> E = Kchi * [ 1 + cos(n*Chi - ChiO) ) 
!Ver Ref I J K L 

ChiO 



1.0 3 
180.0000 

1.0 1 
180 .0000 

1.0 1 
180.0000 

1.0 1 
180.0000 

1,0 1 
180,0000 

1.0 1 
180.0000 



N3 



NA CA CA 



CH C2 



C3 CA CH C3 



NT CH C3 



N3 



CH CH 



H2 N2 CH H2 



0.0000 
0.0000 
0.0000 
0.0000 



0.0000 
0 .0000 
0.0000 
0.0000 
0.0000 



Kchi 



0.0000 
7.0000 
7,0000 
14.0000 
7,0000 
0.0000 



0.0 



0.0 



0.0 



0.0 



0.0000 0.0 



0.0 
0.0 
0.0 
0.0 
0.0 



n 



2 
3 



3 
3 
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1.0 1 

180 . 0000 
1.0 1 
180.0000 
1.0 1 
180.0000 
1.0 1 
180.0000 
1.0 1 
180.0000 
1.0 1 
180.0000 
1.0 1 
180.0000 
1.0 1 

180:0000 

1.0 1 
180.0000 
1.0 1 
180. 0000 
1.0 1 
180.0000 
1.0 1 
180 , 0000 
1.0 1 
180.0000 
1.0 1 
180.0000 
1.0 1 

180.0000 

1.0 1 
180.0000 

1.0 1 

180.0000 

1.0 1 
180.0000 

1.0 1 
180.0000 

1.0 1 



02 



CH C2 

CH CH 

CC CC 

CC CB 



C N 

C2 N 

CT N 

H2 N 

N2 CA 



C N3 

O C 

HC C* 

HC CW 

CB Ctt 

CN CB 

C* CB 

CA CB 

CA CN 



CH 



CH 



CT 



H2 



N2 

C 02 
NT CH 



CH 



PCTAJS96/04229 
14.0000 3 

14.0000 3 



0.0000 
0.0000 
14.0000 
1.0000 
1.0000 
1.0000 
10.5000 
10.5000 
14.0000 
14 .0000 
10.5000 
0.0000 
0.0000 
0.0000 
0.0000 
0.0000 
0.0000 
0.0000 



2 
2 
2 
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180.0000 

1.0 1 NA CN * ♦ 0.0000 

180.0000 

1.0 1 HC CA ♦ ♦ 2.0000 

180.0000 

1.0 1 H N * ♦ 1.0000 

180.0000 

1.0 1 H2 N2 * ♦ 1.0000 

180.0000 

1.0 1 H3 N2 * * 1.0000 

180.0000 

1.0 1 H2 NT * * 1.0000 

180.0000 

1.0 1 H NA ♦ * 1.0000 

180.0000 

1.0 1 $$ $$ $$ $$ 10.0000 

180.0000 

#nonbond (12-6) antoer 
®type r-eps 

©combination arithmetic 

> E = EPSij * { {Rij*/Rij)*12 - 2 (Rij*/Rij) "6 } 

> where EPSij = sqrt ( EPSi * EPSj) 

> Rij* * (Ri* + Rj*)/2 



PCT/US96/04229 



Ver 


Re£ 


I 


Ri* 


EPSi 


1.0 


3 


IM 


5.0000 


0.10000 


1.0 


3 


CU 


2 .4000 


0.05000 


1.0 


3 


1 


4.8000 


0.40000 


1-0 


3 


ow 


3.5360 


0.15200 


1.0 


3 


N6 


2.3400 


0.10000 


1.0 


3 


CO 


3.2000 


0.10000 


1.0 


3 


QC 


6.8000 


0.00008 


1.0 


3 


QK 


5.3200 


0.00033 


1.0 


3 


QL 


2.2800 


0.01800 


1.0 


3 


QN 


3.7400 


0.00280 


1.0 


3 


QR 


5.9200 


0.00017 


1.0 


1 


c 


3.7000 


0.12000 


1.0 


1 


C* 


3.7000 


0.12000 


1.0 


1 


C2 


3.8400 


0.12000 
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PCT/US96/04229 



X • u 


X 




4 0000 

^m. • Www 


0 15000 


J. . u 


X 




7000 


0 12000 




X 




3 . 7000 


0 . 12000 




X 


WW 


3 . 7000 


0 . 12000 






rn 


3 . 7000 


0 , 12000 


X • u 


1 


WXj 


3 . 7000 


0 . 12000 


X • u 


X 




3 . 7000 


0 12000 


X • w 


X 




3 . 7000 


0 . 12000 


1 0 


1 

X 


CH 


3 . 7000 


0 . 09000 


1 0 


1 


CI 

WX 


3 . 7000 


0 . 12000 


1 0 


1 


CJ 


3 . 7000 


0 . 12000 




1 

X 


CK 


3 . 7000 


0 . 12000 






CM 


3 . 7000 


0 , 12000 


X • u 


1 

X 


CN 


3 . 7000 


0 12000 


1 0 


1 

X 


CP 


3 . 7000 


0 12 000 


X « u 


1 

X 


CO 


3.7000 


0 12000 


1 n 

X • V 


X 


CP 


3 . 7000 


0 12000 


X * w 


X 




3 6000 


0 06000 


X • u 


1 

X 


CV 


3 . 7000 


0 12000 


X • u 


1 

X 


cw 


3 . 7000 


0 12000 




n 

X 


U 

n 


7 0000 

X • w w w V 


0 02000 


X « u 


X 




2 0000 

X m W W W V 


0 02000 




1 

X 




2 . 0000 


0 , 02000 


1 0 


X 


HC 


3 . 0800 


0 01000 


X • u 


1 

X 




2 0000 


0 02000 


X • u 


X 


xio 


1 0000 


0 02000 

w . w X w W w 


X • u 


X 


T.P 


2 4000 

X « w W W 


0 01600 


1 0 

X • W 


X 


N 


3 . 5000 


0 16000 


X • LF 


1 

X 


i>i 


3 . 5000 


0 16000 

w . A www w 


1 0 


1 

X 




3 .5000 


0 .16000 


1 0 


X 




3 . 7000 


0 08000 


1 0 


X 




3 .5000 


0 .16000 


1 . 0 


1 

X 




3 .5000 


0.16000 


1 0 


X 


NC 


3 .5000 


0 .16000 


1 0 


1 

X 




3 .5000 


0 .16000 


1 . 0 


1 


NT 


J . /UUU 


u • x^ uuu 


1.0 


1 


0 


3.2000 


0.20000 


1.0 


1 


02 


3.2000 


0.20000 


1.0 


.1 


OH 


3.3000 


0.15000 
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1.0 


1 


OS 


3.3000 




0 


.15000 




1.0 


1 


P 


4.2000 




0 


.20000 




1.0 


1 


s 


4.0000 




0 


.20000 




1.0 


1 


SH 


4.0000 




0 


.20000 




1.1 


4 


cs 


3.6000 




0 


.09030 




1.1 


4 


AC 


3.eooo 




0 


.09030 




1.1 


4 


BC 


3.6000 




0 


.09030 




1.1 


4 


c 


3.7000 




0 


.12000 




1.1 


4 


H 


2.0000 




0 


.02000 




1.1 


4 


HY 


1.6000 




0 


.04980 




1.1 


4 


HT 


2.9360 




0 


. 00450 




1.1 


4 


HO 


2.0000 




0 


.02000 




1.1 


4 


AH 


2.9360 




0 


.00450 




1.1 


4 


BH 


2 .9360 




0 


.00450 




1.1 


4 


OT 


3.2000 




0 


.15910 




1.1 


4 


OA 


3 .2000 




0 


.15910 




1.1 


4 


OB 


3.2000 




0 


.15910 




1.1 


4 


OE 


3.2000 




0 


.15910 




1.1 


4 


OH 


3 .3000 




0 


.15000 




1.1 


4 


0 


3 .2000 




0 


.20000 




1.1 


4 


N 


3 .5000 




0 


.16000 




Hiydrogen_bond ( 10 - 12 ) 


ainber 










. E = 


Aij/r"12 - Bij/r"10 










Ver 


Ref 


I J 


A 






B 




1.0 


3 


H OS 


7557 


.0000 




2385. 


0000 


1.0 


3 


H OW 


7557 


.0000 




2385. 


0000 


1.0 


3 


H2 OS 


7557 


.0000 




2385. 


0000 


1.0 


3 


H2 OW 


7557 


.0000 




2385. 


0000 


1.0 


3 


HW NB 


7557 


.0000 




2385. 


0000 


1.0 


3 


HW NC 


10238 


.0000 




3071. 


0000 


1.0 


3 


HW 0 


7557 


.0000 




2385. 


0000 


1.0 


3 


HW 02 


4019 


.0000 




1409. 


0000 


1.0 


3 


HW OH 


7557 


.0000 




2385. 


0000 


1.0 


3 


HW OS 


7557 


.0000 




2385. 


0000 


1.0 


3 


HW S 


265720 


.0000 




35429. 


0000 


1.0 


3 


HW SH 


265720 


.0000 




35429. 


0000 


1.0 


1 


H NB 


7557 


.0000 




2385. 


0000 


1.0 


1 


H NC 


10238 


.0000 




3071. 


0000 



290 



wo 9600849 



1,0 


1 


H 


02 


4019 . 0000 


1.0 


1 


H 


0 


7557 . 0000 


1 . 0 


1 


H 


OH 


■7 c C7 nonn 
Ibb 1 . UUOO 


1 » 0 


3 


H 


s 


2d27^(J . UUUU 


1 . 0 


3 


H 


oH 


^ OD /4& U . UUUU 


1 . 0 


1 


HO 


No 


/DO/. UUUU 


1 . 0 


1 


HQ 


\ir* 
NU 


I DD f , UUUU 


1 . 0 


1 


HU 


\JZ 


*k\JX^ . UUUU 


1 . 0 


1 


HO 


U 


/DD / . UUUU 


1 . 0 


1 


HO 


OH 


lOOl . UUUU 


1 . 0 


3 


HO 


s 


2 65 /20 . UUUU 


1 . 0 


3 


HO 


5H 


'^zrcTon nnnn 
2oD /2 U . UUUU 


1 . 0 


1 


H2 


NB 


>inno r\ r\f\f\ 
4U19 . UUUU 


1 . 0 


1 


H2 


NC 


4UX J . UUUU 


1 . 0 


1 


H2 


02 


4UX7 * UUUU 


1 . 0 


1 


H2 


O 


lU2Jo .UUUU 


1 . 0 


1 


H2 


OH 


ytniQ nnnn 
4Uli7 . UUUU 


1 . 0 


3 


H2 


5 


^bb / zU . UUUU 


1. 0 


3 


H2 


5H 


20D/2U . UUUU 


1 . 0 


1 


H3 


NB 


4Uiy , UUUU 


1 . 0 


1 


H3 


NC 


4Uiy .UUUU 


1.0 


1 


H3 


02 


jinin nnnn 
4Ul7 . UUUU 


1 . 0 


1 


H3 


0 


^ c C7 nnnn 
7dd / . UUUU 


1 , 0 


1 


H3 


OH 


7dd 7 . UUUU 


1 . 0 


3 


H3 


S 


2dD / ^U . UUUU 


1.0 


3 


H3 


SH 


^ c ^ o n nnnn 

265720 . OOOU 


1 . 0 


1 


HS 


NB 


^A^Qrt nnnn 
141o4 . QUUU 


1 . 0 


1 


HS 


NC 


i>ino>i nnnn 
141o4 . UUUU 


1 . 0 


1 


HS 


02 


141o4 . UUUU 


1 • 0 


1 


HS 


O 


J.4X04 . UUUU 


1 ♦ u 


1 


HS 


/Ml 

OH 


iAifiii nrinn 

X4Xo4 . \J\J\J\J 


1.0 


3 


HS 


S 


265720.0000 


1.0 


3 


HS 


SH 


265720.0000 


fbond_ 


^increments 






1 ver 


Ref 


I 


u 




1.1 


5 


CM 


CM 


0.000 


1.1 


5 


CA 


CA 


0.000 


1.1 


5 


CB 


CB 


0.000 



PCT/US96/04229 

1409.0000 

2365.0000 

2385.0000 
35429 .0000 
35429.0000 

2385.0000 

2385.0000 

1409 .0000 

2385.0000 

2385.0000 
35429.0000 
35429.0000 

1409.0000 

1409.0000 

1409 .0000 

3071.0000 

1409.0000 
35429.0000 
35429.0000 

1409.0000 

1409.0000 

1409.0000 

2385.0000 

2385.0000 
35429.0000 
35429.0000 

3082.0000 

3082.0000 

3082.0000 

3082.0000 

3082.0000 
35429.0000 
35429.0000 

DeltaJI 



0.000 
0.000 
0.000 
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X * X 


c 




pfi 
wo 


0 ODO 


U • W Vf V 


± • X 


c 




PT* 

W 


0 . 000 


n nnn 


1 1 

X m X 




HT 


PT 

W X 


0 . 066 


-0 . 066 


X.J. 


c 
? 


TI 

XI 


NT 

*rt X 


0 . 133 


-0 135 


X • X 




NT 


CT 


-0 . 189 


0.189 


X w X 




CA 


OH 


0 .334 


-0 334 

w . w w ~s 








OS 


0 * 237 


-0 237 


1 1 

X • X 






PT 

W X 


0 . 066 


-0 . 066 


-I 1 

X • X 




PS 


PS 

W w 


0 . 000 


0 . 000 


X • X 






PS 

W w 


0 . 000 


0 000 








PS 

W w 


0 . 000 


0 , 000 


X . X 


^: 
o 




PT 
w ^ 


0 000 


0 000 


X • X 


c 
o 




O*^ 


0 2 00 


-0 200 


X • X 


q 


N* 


PS 

Ww 


-0,183 


0 1 fl"^ 

w * X O J 


1 1 

X * X 




OT 

w X 


WY 

Xl X 


-0 400 


0 400 


1 1 

X • X 


i: 

D 




Xl X 


-0.400 


0 400 


X > X 


D 


OR 


HY 

XI X 


* 0 . 400 


n 400 


1 1 

X • X 


C 

V 


P^ 


HT 

il X 


-0 100 

W • X U w 


n ion 

w » X W w 


X • X 


c 


AP 


AH 


-0,100 


0 100 


X * X 


D 


PP 


PH 

CXI 


-0.100 


0 100 

L/ * X V V/ 


X • X 


C 
D 


AP 


HT 

xl ± 


u . X U w 


n T on 

W • X w w 


X • X 


O 


RP 


HT 


-0 100 


0 100 

V/ • X w w 


X • X 


C 


AP 


OA 


0 2S0 


- 0 250 


X . X 


o 


PP 


OP 


0 250 


. 0 250 


X • X 




PQ 


OA 


0 250 

U • X W \/ 


- 0 250 


1 1 

X • X 


V 


P^ 


OB 


0 . 250 


-0 . 250 


1 1 




PS 


OT 

W X 


0 . 250 


- 0 . 250 




c 


PS 

Ww 


OE 


0 . 200 


-0 .200 


1 1 


g 


AP 


OE 


0 ,200 


-0 . 200 


1 . 1 


5 


BP 

&w 


OE 


0 .200 


-0 . 200 




c 


OW 


HW 

Xxr* 


-0 .380 


0 . 380 


1 1 

X • X 




111 


PT 
W X 


-0 . 183 


0 . 183 


X • X 




p 

* 


OS 


0 .254 


-0 . 254 


1 . 1 


5 


CB 


N* 


0 . 130 


-0 .130 


1 - 1 


5 


CK 

W*v 


N* 


-0.253 


0 .253 


1 . 1 


5 


NC 


CB 






1.1 


5 


NB 


CB 


0.020 


-0.020 


1.1 


5 


CB 


CA 


0.000 


-0.000 


1.1 


5 


CK 


NB 


0.566 


-0.566 
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1 • 1 


b 


CK 


HC 


-0 . 051 


0 .051 


1 - 1 


c 


N2 


CA 


-0 . 162 


0.162 


X » X 


c 


NC 


CA 


-0.430 


0 .430 


X • X 


c 
D 


H2 


N2 


0.318 


- 0 ,318 


X . X 


c 


CO 


NC 


0 . 341 


-0 . 341 


X . X 


c 


CQ 


HC 


0 . 005 


-0.005 


X » X 


c 


02 


P 


-0 . 913 


0 .413 


X . X 


c 
D 


C 


N* 


-0 .044 


0 .044 


X . X 


c 


CM 


N* 


0 . 137 


-0 . 137 


X . X 


c 
D 


NA 


C 


-0 . 255 


0 . 255 


X . X 


c 


0 


C 


-0.492 


0 . 492 


X . X 


c 


NA 


H 


-0.282 


0 .282 


X . X 


r 
D 


CM 


C 


-0 . 150 


0 . 150 


X . X 


c 

3 


CM 


CT 


0.055, 


- 0 . 055 


X . X 


c 
D 


CM 


HC 


-0 . 101 


0 . 101 


X . X 


c 


H2 


CT 


0 , 119 


-0 . 119 


X • X 


c 


C 


NC 


0 . 424 


-0 ,424 


X . X 


D 


CM 


CA 


-0 .409 


0.409 


X • X 




N2 


HC 


-0 . 037 


0 . 037 


X . X 


r 


OH 


CT 


-0 .263 


0 .263 


X • X 


c 


HO 


OH 


0 .303 


-0 . 303 


X . X 


c 
D 


C 


CB 


-0 . 005 


0.005 


X . X 


c 
3 


NA 


CA 


-0 . 215 


0 . 215 


X . X 




CT 


N 


0 .171 


-0.171 


X . X 


c 


H 


N 


0 .274 


-0 .274 


X . X 


D 


C 


CT 


0 . 095 


-0 . 095 


X • X 




C 


N 


0 . 139 


-0 . 139 


X * X 




N2 


CT 


0 . 044 


-0 . 044 


X . X 


r 


H3 


N2 


0 .551 


-0 .351 


X . X 


D 


02 


C 


-0 .792 


0 .292 


X • X 


c 


S 


CT 


-0.023 


0 . 023 




C 


LP 


S 


-0 .403 


0 .403 


X • X 


C 


SH 


CT 


-0 . 033 


0 .033 


X • X 


D 


HS 


SH 


0 • 127 


-0 • 127 


X . X 


c 
D 


SH 


LP 


0.489 


-0 .489 


X.l 


5 


CC 


CT 


0 .007 


-0.007 


x.x 


5 


NB 


CC 


-0.256 


0.256 


x.x 


5 


CW 


CC 


0.018 


-0.018 


1.1 


5 


CR 


NB 


0.251 


-0.251 
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1 . 1 


5 


NA 


CR 


n t\C£L 


0 . 066 


1 . 1 


5 


CR 


HC 


ri t\c^ 
- 0 . 0o7 


0 . 067 


1 . 1 


5 


CJW 


VT7V 

NA 


n c "7 


0 . 057 


1 . 1 


3 


UW 


ilL. 


— n noQ 

- U . UJrif 


A n Q o 

u . uyy 


X . 1 


c 

D 






- n no n 
- u . u 


f\ ft "5 n 


1 . 1 






"DC 






X . X 




U V 




u . u J b 


f\ A "a c 

- U . U J D 


X . X 




U V 




u . / 


n o *7 


X . X 




V 


wn 
nu 




U , 


X * X 




14 J 


CT 


n on 
u . i7Ub 


U . UJyb 


X . X 




DiJ 




- U . b 


n "a ^ 
U . 


1 . 1 


5 


CA 


CT 


n n o 
- U . U J J 


n A ^ ^ 
0 . 033 


X . X 


c 


LIA 


ilu 


U , X UX 


U . XUX 


X . X 


c 

r> 




LI 


U • U Ub 


T\ O A t 


X . X 


c 

b 




r*M 
UW 


- u . xy ^ 


A TOO 
0 . X^^ 


X . X 


5 


CB 


C* 


- U . U4o 


U . 04 b 


X . X 


5 


CN 


NA 


U . X7 D 


A 1 "7 ^ 

- 0 , X7o 


X.l 


5 


CN 


CA 


0.074 


-0.074 


X.l 


5 


CB 


CN 


0.104 


-0.104 


1.x 


5 


CA 


C 


-0.181 


0 .181 


1.1 


5 


OH 


C 


-0.081 


0.081 



#reference 1 
creation of file 
#reference 2 

Lone pair Ip had incorrect mass of 0.001097. 

Angle CT-C-02 was by error included twice. 

Torsion OH-C2-C2-OH was written as two separate lines. 

Hence only one of the energy terms was included. 

©Author Jon Hurley 

®Date 13-DeceTnber-90 

#reference 3 

parameter set modified with the addtional parameters 
from kollraan's parm89a rev a force field file 
note that the HW...OW hydrogen bond parameters and 
the HW van der waals parameters are not included in 
the files since they are equal to zero in parm89a. 
©Author torn thacher 
©Date ll-March-92 
#refer nee 4 
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hotnans' carbohydrate potential 

©Author Tom Thacher 

©Date 7-July-1992 

#reference 5 

bond increments 

©Author Tom Thacher 

©Date 7- July- 1992 

#end 

END OF LISTING 
DATA FILE FOR H BOND FORCES - HBOND.DAT 



47 !data items 

IBIOSYM forcefield 2 

! version amber. frc 1.0 19-Oct-90 

! version amber. frc i.i 8-Aug-92 
! define amber 

I This is the new format version of the amber forcefield 
Ihbond definition amber 



1.0 


1 


distance 


2. 


5000 








1.0 


1 


angle 




90. 


0000 








1.0 


1 


donors 




H 


HO 


H2 H3 


HS 




1.0 


1 


acceptors 


NB 


NC 


02 0 


OH S 


SH 


hycirogen_bond (10 


-12) 


amber 








E = 


Aij/r*12 - 


Bij/r*10 










Ver 


Ref 


I 


J 




A 




B 




1.0 


3 


H 


OS 




7557. 


0000 


2385 . 


0000 


1.0 


3 


H 


OW 




7557. 


0000 


2385. 


0000 


1.0 


3 


H2 


OS 




7557. 


0000 


2385. 


0000 


1.0 


3 


H2 


OW 




7557. 


0000 


2385. 


0000 


1.0 


3 


HW 


NB 




7557. 


0000 


2385. 


0000 
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1 . 0 


3 


HW 


NC 


10238 .0000 


1, 0 


3 


HW 


0 


7557 .0000 


1 . 0 


3 


HW 


02 


4019 .0000 


1.0 


3 


HW 


OH 


7557 .0000 


1.0 


3 


HW 


OS 


7557 .0000 


1 . 0 


3 


HW 


5 


265720.0000 


1 . 0 


3 


HW 


SH 


265720.0000 


1.0 


1 


H 


NB 


7557.0000 


1.0 


1 


H 


NC 


10238 .0000 


1 . 0 


1 


H 


02 


4019 .0000 


1.0 


1 


H 


0 


7557 .0000 


1 . 0 


1 


H 


OH 


7557 ,0000 


1 • 0 


3 


H 


s 


265720 .0000 


1.0 


3 


H 


SH 


265720 .0000 


1.0 


1 


HO 


NB 


7557.0000 


1.0 


1 


HO 


NC 


7557 .0000 


1.0 


1 


HO 


02 


4019 .0000 


1.0 


1 


HO 


0 


7557 .0000 


1,0 


1 


HO 


OH 


7557 .0000 


1.0 


3 


HO 


S 


265720.0000 


1.0 


3 


HO 


SH 


265720.0000 


1.0 


1 


H2 


NB 


4019.0000 


1.0 


1 


H2 


NC 


4019 , 0000 


1.0 


1 


H2 


02 


4019 . 0000 


1.0 


1 


H2 


0 


10238.0000 


1.0 


1 


H2 


OH 


4019.0000 


1.0 


3 


H2 


S 


265720.0000 


1,0 


3 


H2 


SH 


265720.0000 


1.0 


1 


H3 


NB 


4019.0000 


1.0 


1 


H3 


NC 


4019.0000 


1.0 


1 


H3 


02 


4019.0000 


1.0 


1 


H3 


0 


7557.0000 


1.0 


1 


H3 


OH 


7557.0000 


1.0 


3 


H3 


S 


265720.0000 


1.0 


3 


H3 


SH 


265720,0000 


X • u 










1.0 


1 


HS 


NC 


141B4.0000 


1.0 


1 


H5 


02 


141B4.0000 


1.0 


1 


HS 


0 


141B4.0000 
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3071.0000 

2385.0000 

1409.0000 

2385.0000 

2385 .0000 
35429.0000 
35429.0000 

2385.0000 

3071.0000 

1409.0000 

2385.0000 

2385 .0000 
35429.0000 
35429.0000 

2385.0000 

2385.0000 

1409.0000 

2385.0000 

2385.0000 
35429.0000 
35429.0000 

1409.0000 

1409 .0000 

1409 .0000 

3071.0000 

1409.0000 
35429 .0000 
35429.0000 

1409.0000 

1409-0000 

1409.0000 

2385.0000 

2385.0000 
35429.0000 
35429.0000 

3082.0000 

3082.0000 

3082.0000 

3082.0000 
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1.0 1 HS 
1,0 3 HS 
1.0 3 HS 

DATA FILE FOR LENNARD JONES FORCES - LJ_PARAM.DAT 

74 I total atoms 
IBIOSYM forcefield 2 
Iversion atnber.frc 1,0 19-Oct-90 
! version amber. frc 1.1 8 -Aug- 92 
! define amber 

* This is the new format version of the amber forcefield 
! nonbond (12-6) ambe r 
Itype r-eps 

! combination arithmetic 



: E » 


EPSij * ( 


(Rij*/Rij)*12 - 2 (Rij 


♦ /Ri-i ) *6 


where 


EPSij = 


■■ sqrt ( 


EPSi * EPSj) 








Rij* = 


■■ (Ri* 


+ Rj*)/2 




Ver 


Ref 


I 




Ri* 


EPSi 


1.0 


3 


IM 




5.0000 


0.10000 


1.0 


3 


CU 




2.4000 


0.05000 


1.0 


3 


I 




4.8000 


0-40000 


1.0 


3 


OW 




3.5360 


0.15200 


1.0 


3 


MG 




2.3400 


0.10000 


1.0 


3 


CO 




3.2000 


0.10000 


1.0 


3 


QC 




6.8000 


0.00008 


1.0 


3 


QK 




5.3200 


0.00033 


1.0 


3 


QL 




2.2800 


0.01800 


1.0 


3 


QN 




3.7400 


D. 00280 


1.0 


3 


QR 




5.9200 


0.00017 


1.0 


1 


C 




3.7000 


0.12000 


1.0 


1 


C* 




3.7000 


0.12000 


1.0 


1 


C2 




3.8400 


0.12000 


1.0 


1 


C3 




4.0000 


0.15000 


1.0 


1 


CA 




3.7000 


0.12000 


1.0 


1 


CB 




3.7000 


0.12000 



PCT/US9«/04229 

OH 14184.0000 3082.0000 

S 265720.0000 35429.0000 

SH 265720.0000 35429.0000 
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1.0 


1 


cc 


2 .7000 


1.0 


1 


CD 


3 .7000 


1.0 


1 


CE 


3 .7000 


1.0 


1 


CF 


3 . 7000 


1.0 


1 


CG 


3 .7000 


1.0 


1 


CH 


3 . 7000 


1,0 


1 


CI 


3 .7000 


1.0 


1 


CJ 


3 . 7000 


1.0 


1 


CK 


3 .7000 


1.0 


1 


CM 


3 .7000 


1.0 


1 


CN 


3 .7000 


1.0 


1 


CP 


3 .7000 


1.0 


1 


CQ 


3 .7000 


1.0 


1 


CR 


3 .7000 


1.0 


1 


CT 


3 .6000 


1.0 


1 


CV 


3 .7000 


1.0 


1 


cw 


3 .7000 


1.0 


1 


H 


2 .0000 


1.0 


1 


H2 


2 .0000 


1,0 


1 


H3 


2 .0000 


1.0 


1 


HC 


3 .0800 


1.0 


1 


HO 


2.0000 


1.0 


1 


HS 


2.0000 


l.C 


1 


LP 


2 .4000 


1.0 


1 


N 


3 .5000 


1.0 


1 


N* 


3 .5000 


1,0 


1 


N2 


3 .5000 


1,0 


1 


N3 


3 .7000 


1,0 


1 


NA 


3.5000 


1.0 


1 


NB 


3 .5000 


1.0 


1 


NC 


3 .5000 


1 , 0 


1 


NP 


3 .5000 


1.0 


1 


NT 


3 .7000 


1.0 


1 


0 


3 .2000 


1.0 


1 


02 


3 .2000 


1.0 


1 


OH 


3.3000 


1.0 


1 


OS 


3-3000 


1.0 


1 


P 


4.2000 


1.0 


1 


S 


4.0000 



PCTAJS96fl)4229 

0 .12000 
0.12000 
0.12000 
0.12000 
0.12000 
0.09000 
0.12000 
0.12000 
0.12000 
0.12000 
0.12000 
0.12000 
0.12000 
0.12000 
0.06000 
0.12000 
0.12000 
0.02000 
0.02000 
0.02000 
0.01000 
0.02000 
0.02000 
0.01600 
0 .16000 
0.16000 
0.16000 
0.08000 
0.16000 
0.16000 
0.16000 
0.16000 
0.12000 
0.20000 
0.20000 
0.15000 
0.15000 
0.20000 
0.20000 



298 



wo 9600849 



PCrAJS96/04229 



1.0 


1 


SH 


4.0000 


0.20000 


1.1 


4 


CS 


3.6000 


0.09030 


1.1 


4 


AC 


3.6000 


0.09030 


1.1 


4 


BC 


3.6000 


0.09030 


1.1 


4 


C 


3.7000 


0.12000 


1.1 


4 


H 


2.0000 


0.02000 


1.1 


4 


HY 


1.6000 


0.04980 


1.1 


4 


HT 


2.9360 


0.00450 


1.1 


4 


HO 


2.0000 


0.02000 


1.1 


4 


AH 


2.9360 


0.00450 


1.1 


4 


BH 


2.9360 


0. 00450 


1.1 


4 


OT 


3.2000 


0.15910 


1.1 


4 


OA 


3 .2000 


0 .15910 


1.1 


4 


OB 


3.2000 


0.15910 


1.1 


4 


OE 


3 .2000 


0.15910 


1.1 


4 


OH 


3.3000 


0.15000 


1.1 


4 


0 


3.2000 


0.20000 


1.1 


4 


N 


3 .5000 


0.16000 



DATA FILE FOR TORSION FORCES - T0RSI0N.DAT 



179 ! total entries in this data file 
IBIOSYM forcefield 2 
! version amber. frc 1.0 19-Oct-90 
! vers ion amber. frc l.l 8 -Aug- 9 2 
! define amber 

! This is the new format version of the amber forcefield 
ltorsion_3 amber 

! E « SUM(n=l,3) { V(n) ♦ [ 1 + cos(n*Phi - PhiO (n) ) ] } 



!Ver 
V2 

I . 



Ref 
PhiO 



J 
V3 



K 

PhiO 



VI 



PhiO 



1.0 1 0 C C2 N 

0.0000 0.0 0.2000 180.0 

1.0 1 O C CH C2 

0.0000 0.0 0.1000 180.0 



0.0000 0.0 



0.0000 



0.0 
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1.0 1 o 
0.0000 0.0 

1.0 1 0 
0.0000 0.0 

1-0 1 OS 
0.5000 0.0 

1.0 2 OH 
0.5000 0.0 

1.0 1 OS 
0.5000 0.0 

1.0 1 OS 
0.5000 0.0 

1.0 1 OS 
0.5000 0.0 

1.0 1 OH 
0.5000 0.0 

1.0 1 C2 
0.0000 0.0 

1.0 1 CH 
0.0000 0.0 
1.0 1 OS 

0- 5000 0.0 
1.0 1 OH 

0.5000 0.0 

1.0 1 OS 
0.5000 0.0 

1.0 1 OS 
0.5000 0.0 

1-0 1 HC 
1.7100 180.0 

1.0 1 C 

6.5900 180.0 

1.0 1 N* 

6.5900 180,0 

1.0 1 CA 

6.5900 180.0 

1.0 1 N» 

9.5100 180.0 

1- 0 1 HC 



C C3J N 

0.1000 180.0 

C CH CH 

0.1000 180.0 

C2 C2 OH 

2.0000 0,0 

C2 C2 OH 

2.0000 0.0 

C2 C2 OS 
2.0000 0,0 

C2 CH OS 

1.0000 0.0 

C2 CH OH 

1.0000 0.0 

C2 CH OH 

1.0000 0.0 

C2 S LP 

0.0000 0.0 

C2 SH LP 

0.0000 0.0 

CH C2 OH 
1.0000 0.0 

CH CH OH 
0.5000 0,0 

CH CH OH 
0.5000 0.0 

CH CH OS 
0-5000 0.0 

CM CM CT 
0.0000 0.0 

CM CM HC 
0.0000 0.0 

CM CM CT 
0.0000 0.0 

CM CM HC 
0.0000 0.0 

CM CM CA 
0,0000 0.0 
CM CM HC 



PCT/US96/04229 
0.0000 0.0 

0.0000 0,0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0,0000 0.0 

0,0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0,0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 
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1.7100 180.0 0.0000 0.0 

1.0 1 CM CM C 

9.5100 180.0 0.0000 0.0 

1.0 1 CM CM HC 

6.5900 180.0 0.0000 0.0 

1.0 1 N CT C 0 

0.0000 0.0 0.0S70 180.0 

1.0 1 HC CT C 0 

0.0000 0.0 0.0670 180,0 

1.0 1 CT CT C 0 

0.0000 0.0 0.0670 180.0 

1.0 1 CT OS CT CT 

0.2000 180.0 0.3830 0.0 

1.0 1 OS CT CT OS 

0.5000 0.0 0.1440 0,0 

1.0 1 OS CT CT OH 

0.5000 0.0 0.1440 0.0 

1.0 1 OH CT CT OH 

0,5000 0.0 0.1440 0.0 

1.0 1 H N C 0 

2.5000 160.0 0.0000 0.0 

1.0 1 C2 OS C2 C3 

0.1000 0.0 0.7250 0.0 

1.0 1 C2 OS C2 C2 

0.1000 0.0 1.4500 0.0 

1.0 1 C3 OS C2 C3 

0.1000 0.0 1.4500 0.0 

1.0 1 CH OS CH C2 

0.1000 0.0 0.7250 0.0 

1.0 1 CH OS CH CH 

0.1000 0.0 0.7250 0.0 

1.0 1 C2 OS CH C2 

0.1000 0.0 0.7250 0.0 

1.0 1 C3 OS CH C3 

0.1000 0.0 0.7250 0,0 

1.0 1 CH OS CH N* 

0.0000 0.0 0.7250 0.0 

1.0 1 C2 OS CH C3 

0.1000 0.0 0.7250 0.0 



PCTAJS96/04229 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0,0000 0.0 

0,0000 0.0 

0.0000 0.0 

0,0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.6500 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 
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1.0 



OH 



0.7500 0.0 

1.0 1 
0.75O0 0.0 

1.0 1 
0.7500 0.0 



OS 



OH 



1.0 1 OS 
0.7500 0.0 

1.0 1 OS 
0.7500 0.0 

1.0 1 OS 
0.7500 0.0 

1.0 1 OH 
0.7500 0.0 

1.0 1 OH 
0.7500 0.0 

1.0 1 LP 
0.0000 0.0 

1.0 1 LP 
0.0000 0.0 

1.0 1 C2 
3.5000 0.0 

1.0 1 CT 
3.5000 0.0 

1.0 1 LP 
0.0000 0.0 

1.1 4 OE 
0.0000 0.0 

1.1 4 AH 
1.7500 60.0 

1.1 4 CS 
0.0000 0.0 

1-1 4 OE 
0.0000 0.0 

1.1 4 AH 
1.7500 60.0 

1 .1 4 CS 
0.0000 0.0 

1.1 4 OE 



P 

0.2500 
P 

0.2500 
P 

0.2500 
P 

0.2500 
P 

0.2500 
P 

0.2500 
P 

0.2500 
P 

0 .2500 
S 

0.0000 
S 

0 .0000 
S 

0.6000 

s 

0.6000 
S 

0.0000 

AC 
0.0000 

AC 
0.0000 

AC 
0.8500 

AC 
0.0000 

AC 
0.0000 

AC 
0.8500 

BC 



OS C3 

0.0 
OS C2 

0.0 
OS C2 

0.0 
OS CT 

0.0 
OS CH 

0.0 
OS C3 

0.0 
OS CH 

0.0 
OS CT 

0.0 
S LP 

0.0 
S C2 

0.0 
S C2 

0.0 
S CT 

0.0 
S CT 

0.0 
OA CS 

0.0 
OA CS 

0.0 
OA CS 

0.0 
OA HY 

0.0 
OA HY 

0.0 
OA HY 

0.0 
OB CS 



PCT/US96y04229 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

2.1500 300.0 

0.0000 0.0 

0.0000 0.0 

2.1500 300.0 

0.0000 0.0 

0.0000 0.0 

■1.0500 0.0 



302 



VfO 96/30849 

0.0000 0.0 

1.1 4 BH 
1.2500 240.0 

1.1 4 CS 
0.0000 0.0 

1.1 4 OE 
0.0000 0.0 

1.1 4 BH 
1.2500 240.0 

1.1 4 CS 
0.0000 0.0 

1.14 HT 
0.0000 0.0 

1.14 HT 
0.0000 0.0 

1.14 H 
2.5000 180.0 

1.14 HT 
0.0000 0.0 

1.0 3 * 
5.3000 180.0 

1.0 1 ♦ 
0.0000 0.0 

1.0 1 * 
5.3000 180.0 

1.0 1 * 
4.4000 180.0 

1.0 1 * 
5.3000 180.0 

1.0 1 * 
0.0000 0.0 

1.0 1 * 
3.1000 180.0 

1.0 1 * 

3.1000 180.0 

1.0 1 * 
0.0000 0.0 

1.0 1 ♦ 
10.0000 180.0 



0.0000 0.0 

BC OB CS 
0.0000 0.0 

BC OB CS 
1.4000 0.0 

BC OB HY 
0.0000 0.0 

BC OB HY 
0.0000 0.0 

BC OB HY 
1.4000 0.0 

AC OA CS 
0.8500 0.0 

BC OB CS 
1.4000 0.0 

N C 0 

0.0000 0.0 

CS C 0 
0.0670 18D.D 

CB CD * 
0.0000 0.0 

C C2 * 

0.0000 180.0 

C CA * 

0.0000 0.0 

C CB ♦ 

0.0000 0.0 

C CD * 

0.0000 0.0 

C CH * 

0.0000 0-0 

C CJ * 

0.0000 0.0 

C CM * 

0.0000 0.0 

C CT * 

0.0000 0.0 

C N * 

0.0000 0-0 



PCTAJS96/04229 

0.0000 0.0 

0.0000 0.0 

-1.0500 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.6500 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 
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1.0 1 * 
5.8000 IBO.O 

1.0 1 ♦ 
5.4000 180.0 

1.0 1 * 
8.0000 180.0 

1.0 1 * 
1.8000 180.0 

1.0 1 ♦ 
0.0000 0.0 

1.0 1 * 
4.8000 180.0 

1.0 1 ♦ 
23.6000 180.0 

1.0 1 * 
0.0000 0.0 

1.0 1 * 
23.6000 180.0 

1.0 1 * 
0.0000 0.0 

1.0 1 ♦ 
0.0000 0.0 

1.0 1 ♦ 
0.0000 0.0 

1.0 1 ♦ 
0.0000 0.0 

1.0 1 ♦ 
0.0000 0.0 

1.0 1 ♦ 
0.0000 0.0 

1.0 1 • 
0.0000 0.0 

1.0 1 * 
0.0000 0.0 

1.0 1 * 
0.0000 0.0 

1.0 1 * 
0.0000 0.0 

1.0 1 ♦ 



C N* ♦ 

0.0000 0,0 

C NA ♦ 

0.0000 0.0 

C NC * 

0.0000 0.0 

C OH * 

0.0000 0.0 

C* C2 * 
0.0000 0.0 

C* CB * 
0.0000 0.0 
C* CG * 

0.0000 0.0 

C* CT * 

0.0000 0.0 

c* cw ♦ 

0.0000 0.0 

C2 C2 * 

2.0000 0.0 

C2 CA ♦ 
0.0000 0.0 

C2 CC * 
0.0000 0.0 

C2 CH ♦ 
2.0000 0.0 

C2 N ♦ 
0.0000 0.0 

C2 N2 * 
0.0000 0.0 

C2 N3 * 
1.4000 0.0 

C2 NT ♦ 
1.0000 0.0 

C2 OH * 
0.5000 0.0 

C2 OS * 
1.4500 0.0 

C2 S * 



PCT/US96/04229 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 



0 .0000 



0.0000 



0.0 
0.0 



0.0000 0.0 



0.0000 0.0 



0.0000 0.0 



0.0000 0.0 



0.0000 0.0 



0 .0000 
0.0000 



0.0 



0.0 
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0.0000 0.0 

1.0 1 * 
0.0000 0.0 

1.0 1 * 
S.3000 180.0 

1.0 1 ♦ 
10.2000 180.0 
1.0 1 * 
5.3000 180.0 

1.0 1 * 
3.7000 180.0 

1.0 1 * 
3.7000 180.0 

1.0 1 ♦ 
10.6000 180.0 
1,0 1 * 
0.0000 0.0 

1.0 1 * 
6.8000 180.0 

1.0 1 * 
6.0000 180.0 

1.0 1 ♦ 
9.6000 180.0 

1.0 1 ♦ 
16.3000 180,0 

1.0 1 * 
20.0000 180.0 
1.0 1 * 
6.6000 180.0 

1.0 1 ♦ 
5.1000 180.0 

1.0 3 * 
8.3000 180.0 

1.0 1 ♦ 
14,3000 180,0 

1.0 1 * 
15.9000 180,0 
1.0 1 * 
0.0000 0.0 



1.0000 0.0 

C2 SH * 

0.7500 0.0 

CA CA ♦ 
0.0000 0.0 
CA CB ♦ 
0,0000 0.0 
CA CD ♦ 
0.0000 0.0 

CA CJ ♦ 
0,0000 0.0 

CA CM * 
0.0000 0.0 
CA CN * 

0.0000 0.0 
CA CT * 

0.0000 0.0 

CA N2 * 
0.0000 0.0 

CA NA ^ 
0,0000 0.0 

CA NC ♦ 
0.0000 0.0 
CB CB * 

0.0000 0.0 
CB CN ♦ 

0.0000 0.0 
CB N* ♦ 
0.0000 0.0 

CB NB ♦ 
0.0000 0.0 

CB NC * 
0,0000 0.0 
CC CF ♦ 
0.0000 0.0 
CC CG * 
0.0000 0.0 
CC CT ♦ 
0.0000 0.0 
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0.0000 0,0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0,0 

0.0000 0.0 

0,0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0,0 

0.0000 0.0 

0.0000 0.0 
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1.0 1 ♦ 
14.3000 180.0 

1.0 1 * 
15.9000 180.0 

1.0 1 * 
5.6000 180.0 

1.0 1 * 
4.8000 180.0 

1.0 1 * 
5.3000 180.0 

1.0 1 * 
5.3000 180.0 

1.0 1 * 
6.7000 180.0 

1.0 1 ♦ 
20.0000 180.0 

1.0 1 ♦ 
4.8000 180.0 

1.0 1 • 
6.0000 180.0 

1.0 1 * 
0.0000 0.0 

1.0 1 ♦ 
0.0000 0.0 

1.0 1 ♦ 
0.0000 0.0 

1.0 1 * 
0.0000 0.0 

1.0 1 ♦ 
0.0000 0.0 

1.0 1 * 
0.0000 0.0 

1.0 1 ♦ 
13.5000 180.0 

1.0 1 ♦ 
24.4000 180.0 

1.0 1 * 
24.4000 180.0 

1.0 1 » 



CC CV * 

0.0000 0.0 

CC cw ♦ 

0.0000 0.0 

CC NA ♦ 
0.0000 0.0 

CC NB * 
0.0000 0.0 

CD CD ♦ 
0.0000 0.0 

CD CN ♦ 
0.0000 0.0 

CB N* ♦ 
0.0000 0.0 

CE NB ♦ 

0.0000 0.0 

CF NB * 
0.0000 0.0 

CG NA * 
0.0000 0.0 

CH CH * 
2.0000 0.0 

CH N * 
0.0000 0.0 

CH N* * 
0.0000 0.0 

CH NT * 
1.0000 0.0 

CH OH * 
0.5000 0.0 

CH OS * 
1.4500 0.0 

CI NC ♦ 
0.0000 0.0 

CJ CJ ♦ 
0.0000 0.0 

CJ CM ♦ 
0.0000 0.0 

CJ N* ♦ 
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0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 
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7.4000 180.0 

1.0 1 * 
6.7000 IBO.O 

1.0 1 * 
20.0000 180.0 

1.0 1 * 
24.4000 180.0 

1.0 1 * 
0.0000 0.0 

1.0 1 * 
7.4000 180.0 

1.0 1 * 
12.2000 180.0 

1.0 1 
9.3000 180.0 

1.0 1 ♦ 
10.0000 180.0 

1.0 1 ♦ 
13.5000 180.0 

1.0 1 * 
9.3000 180.0 

1.0 1 * 
10.0000 180.0 

1.0 1 * 
0.0000 0.0 

1.0 1 * 
0.0000 0.0 

1.0 1 * 
0.0000 0.0 

1.0 1 * 
0.0000 0.0 

1.0 1 ♦ 
0.0000 0.0 

1.0 1 * 
0.0000 0.0 

1.0 1 * 
0.0000 0.0 

1.0 1 * 
0.0000 0.0 



0.0000 0.0 
CK N* * 

0.0000 0.0 
CK NB * 
0.0000 0.0 
CM CM * 
0.0000 0.0 
CM CT * 

0.0000 0.0 
CM N* * 

0.0000 0.0 
CN NA ♦ 
0.0000 0.0 
CP NA * 

0.0000 0.0 
CP NB * 
0.0000 0.0 
CQ NC * 
0.0000 0.0 
CR NA ♦ 

0.0000 0.0 
CR NB ♦ 
0.0000 0.0 
CT CT * 

1.3000 0.0 
CT N * 

0.0000 0.0 
CT N* * 

0.0000 0.0 
CT N2 • 

0.0000 0.0 
CT N3 * 

1.4000 0.0 
CT OH * 

0.5000 0.0 
CT OS ♦ 

1.1500 0.0 
CT S ♦ 

1.0000 0.0 



0.0000 0.0 



0.0000 0.0 



0.0000 0.0 



0.0000 0.0 



0.0000 0.0 



0.0000 0.0 



0.0000 0.0 



0.0000 0.0 



0.0000 0.0 



0.0000 0.0 



0.0000 0.0 



0.0000 0.0 



0.0000 0.0 



0.0000 0.0 



0.0000 
0.0000 
0 .0000 



0.0 



0.0 



0.0 



0.0000 0.0 
0.0000 0.0 
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1.0 1 * 
0.0000 0.0 

1.0 1 * 
4.8000 180.0 

1.0 1 * 
6.0000 180.0 

1.0 1 ♦ 
0.0000 0.0 

1.0 1 * 
0.0000 0.0 

1.14 * 
0,0000 0.0 

1.1 4 * 
0.0000 0.0 

1.1 4 ♦ 
0.0000 0.0 

1.1 4 ♦ 
0.0000 0.0 

1.14 ♦ 
0.0000 0.0 

1.14 * 
0.0000 0.0 

1.14 ♦ 
0.0000 0.0 

1,1 4 * 
0.0000 0.0 

1.14 ♦ 
0.0000 0.0 

1,14 * 
0.0000 0.0 

1,1 4 ♦ 
0.0000 0.0 

1.1 4 * 
0.0000 0.0 

1.1 4 ♦ 
0.0000 0.0 

1.1 4 * 
10.0000 180,0 
1.1 4 * 



CT SH * 
0.7500 0,0 

CV NB ♦ 
0.0000 0.0 

CW NA * 
0.0000 0.0 

OH P * 
0.7500 0.0 

OS P * 
0.7500 0.0 

CS CS * 
1.0210 0.0 

CS CT * 

1.0210 0.0 

AC CS * 
1.0210 0.0 

BC CS * 
1.0210 0.0 

CS OT * 
0,4430 0.0 

CS OE ♦ 
0.9280 0.0 

AC OE * 
0.9280 0.0 

BC OE * 
0.9280 0.0 

AC OA * 
0,0000 0.0 

BC OB ♦ 
0,0000 0.0 

CS OA ♦ 
0.0000 0.0 

CS OB * 
0,0000 0.0 

CS N * 
0.0000 0.0 
C N * 

0.0000 0,0 
C CS ♦ 
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0.0000 0.0 

0.0000 0.0 
0.0000 0.0 
0.0000 0.0 
0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0.0 

0-0000 0.0 

0.0000 0.0 

0.0000 0.0 

0.0000 0,0 

0.0000 0.0 

0.0000 0,0 

0.0000 0.0 
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0.0000 0.0 0.0000 0.0 

1.0 1 * CT NT * 0.0000 0.0 

0.0000 0.0 1.8000 0.0 

♦ *♦*•*♦*♦♦♦♦♦*♦♦♦♦***♦*♦***♦*♦*♦♦♦***•**•♦♦★**♦♦♦♦•«*■*««♦ 

DATA FILE - CX6C.CAR 
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6 
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-0, 
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CT 


C 




0.035 















309 



wo 96/30849 



HAl 1.550908149 3.403064022 

HC H 0.032 

HA2 -0.097660558 4.132736815 

HC H 0.032 

C 0.730531165 3.827591429 

C C 0.616 

O 1.559375145 4.206208097 

0 O -0-504 

N -0.320742949 3.103195380 

N N -0.463 

HN -0.976177839 2.817016114 

H H 0.252 

CA -0.454134161 2.787581074 

CT C 0.035 

HAl -0.907422830 1.783240810 

HC H 0.032 

HA2 -1.127648566 3.540414569 

HC H 0.032 

C 0.896974016 2.736484179 

C C 0.616 

0 1.315189212 1.712629073 

O O -0.504 

N 1.599575272 3.853622667 

N N -0.463 

HN 1.137216234 4.691535216 

H H 0.252 

CA 2.905944550 3.804217731 

CT C 0.035 

HAl 3.056204584 2.789614618 

HC H 0.032 

HA2 2.897891721 4.540755026 

HC H 0.032 

C 4.014980067 4.050747291 

C C 0.616 

O 4.978871195 4.780583329 

0 0 -0.504 

N 3.887759074 3.450944950 

N N -0.463 

HN 3.003276191 2.844372268 
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-0.212395307 GhY 2 

-0.116611463 GLY 2 

-2.120728786 GLY 2 

-2.957020570 GLY 2 

-2.456098946 GLY 3 

-1.646836012 GLY 3 

-3.875321662 GLY 3 

-3.972773051 GLY 3 

-4.323795441 GLY 3 

-4.547627543 GLY 3 

-5.101282348 GLY 3 

-4.520184621 GLY 4 

-4.019658253 GLY 4 

-5.170228610 GLY 4 

-5.584558431 GLY 4 

-5.994216851 GLY 4 

-4.175561433 GLY 4 

-4.436272241 GLY 4 

-3.006608050 GLY 5 

-2.879487738 GLY 5 
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N 1.936206121 7.605756209 

N N -0.463 

HN 1.983632457 6.528240768 

H H 0.252 

CA 1.485796919 8.428968216 

CT C 0.035 

HA 0.399931102 8.271042216 

HC H 0.032 

C 2.167493478 8.018162291 

C C 0.616 

CB 1.746659419 9.902481747 

CT C -0.098 

HBl 2.709270705 10.016688002 

HC H 0.050 

HB2 1.816139488 10.541353385 

HC H 0.050 

SG 0.440719361 10.532225816 

S S 0.824 

LGl -0.404239097 10.957145937 

LP L -0.405 

LG2 0.793091788 11.329491558 

LP L -0.405 

end 



PCmJS9«/04229 
1.35S640986 CYSN 8 

1.414418956 CYSN 8 

0.240136508 CYSN 8 

0.100059529 CYSN 8 

1.043072620 CYSN 8 

0.610166221 CYSN 8 

1.140264476 CYSN 8 

0.293951287 CYSN 8 

1.6B8457720 CYSN 8 

1.126774557 CYSN 8 

2.359427872 CYSN 8 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: Deem, Michael W. 

Rothberg, Jonathan M. 
Went, Gregory T. 

(ii> TITLE OF INVENTION: CONSENSUS CONFl CURAT lONAL BIAS MONTE 
CARLO METHOD AND SYSTEM FOR PHARMACOPHORE STRUCTURE 
DETERMINATION 

(iii) NUMBER OF SEQUENCES: 10 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Pennle & Edmonds 

(B) STREET: 1155 Aveaue of the Americas 

(C) CITY: New York 

(D) STATE; New York 
(EJ COUNTRY: USA 
(F) ZIP: 10036-2711 

(V) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy diek 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS-DOS 

(D) SOFTWARE: Patentin Release /l.O, Version #1.30 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: To Be ABsigned 

(B) FILING DATE: On Even Date Herewith 

(C) CLASSIFICATION: 

(viii) ATTORNEY /AGENT INFORMATION: 

(A) NAME: Misrock, S. Leelie 

(B) REGISTRATION. NUMBER: 1B,B72 

(C) REFERENCE/DOCKET NUMBER: 7934-007 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (212) 790-9090 

(B) TELEFAX: (212) 869-9741/8864 

(C) TELEX: 66141 PENNIE 



(2) INFORMATION FOR SEQ ID NO;l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide 



(ix) FEATURE: 

(A) NAME/KEY: Disulf ide-bond 

(B) LOCATION: l^.B 

(D) OTHER INFORMATION: /note- ''A disulfide bond is formed 
between the cysteine residues. " 



(Xi) SEQUENCE DESCRIPTION: SEQ ID N0:1: 

Cyfi Xaa Xaa Xaa Xaa Xaa Xaa Cys 

1 5 
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(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 102 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(Xi) SEQUENCE DESCRIPTION: SEQ ID N0:2: 
ACTTCGAAAT TAATACGACT CACTATAGGG AGACCACAAC GCTTTCCCTC CAGAAATAAT SO 
TTTCTTTAAC TTTAACTTTA AGAAGGAGAT ATACATATCC AT 102 
(2) INFORMATION FOR SEQ ID N0:3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 83 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

CCCAGACCCG CCCCCAGCAT TCTGGGTTCC AACCCCCTCT AGACAHNNMN NMNNMNNMNN 60 

HNNACAATCT ATATCTCCTT CTT 83 

(2) INFORMATION FOR SEQ ID NO: 4; 

(i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NOs4: 

TCGTCTCACC TCCCTCAACC TCCCCACAAT GCTCCCGCCG GCTCTCGT 48 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: 0 ingle 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DMA 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

ATCAACTTTG CCTTTACCAG CATTGTGCAG CGCGTTTTCA TC 42 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

{A) LENGTH: 10 amino acide 
(B) TYPE: amino acid 
(DJ TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

Met HiB Cy0 Xaa Xaa Xaa Xaa Xaa Xaa Cys 
15 10 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 8 amino acids 
(B) TYPE: amino acid 
(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

Cys Gly Gly Gly Gly Gly Gly Cys 
1 5 

(2) INFORMATION FOR SEQ ID NOrB: 

<i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
NKKNNKNNKK NKNNKNNKNN KNNKNNKNNK 30 
(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: aingle 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
ACTTCCAAAT TAATACGACT CACTATAGGC ACACCACAAC CGTTTCC 47 
<2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: unknown 

(ii) MOLECULE T^fPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

Cye Aen Thr Leu Lye Gly Asp Cys Gly 
1 5 
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WHAT IS CLAIMED IS: 

1. A method of determining a consensus pharmacophore 
structure comprising the steps of: 

5 (a) identifying from one or more diversity libraries a 

plurality of compounds that bind to a target 
molecule, 

(b) measuring one or more distances in one or more of 
the compounds, and 
10 (c) determining a consensus pharmacophore structure for 

the compounds. 

2. The method of claim 1 wherein said compounds are 
peptides ; peptide derivatives, or peptide analogs. 

15 

3. The method of claim 2 wherein said compounds are peptides 
containing one or more cystines. 

4. The method of claim 3 wherein the peptides comprise the 
20 sequence CX^C (SEQ ID N0:1). 

5. The method of claim 1 further comprising a step of 
selecting a plurality of candidate pharmacophores based 
on rules of chemical homology, the selected plurality of 

25 candidate pharmacophores being used in step (c) to 

determine the consensus pharmacophore structure. 

6. The method of claim 5 wherein the rules of homology 
determine that two candidate pharmacophores are 

50 homologous if they have chemically similar side chains. 

7. The method of claim 1 which further comprises after said 
identifying step, a screening step involving a genetic 
selection technique. 

35 

8* The method of claim 1 wher in the step of measuring 

distance comprises making solid phase nuclear magnetic 
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resonance measurements on selected nuclei in a nuclear 
magn tic resonance spectrometer upon a sample comprising 
on of the compounds. 

5 9. The method of claim 8 wherein the step of measuring 

distances further comprises making rotational echo double 
resonance nuclear magnetic resonance measurements of 
internuclear dipole-dipole interaction strength between 
selected nuclei in the compound in the sample. 

10 

10, The method of claim 8 wherein the sample further 

comprises a substrate having a surface to which the 
compound is attached. 

15 11- The method of claim 8 wherein the sample is cooled below 
room temperature. 

12. The method of claim 8 wherein the compound is bound to 
the target molecule. 

20 

13. The method of claim 10 wherein a plurality of the 
compound is attached to the surface at a surface density 
such that the inter-nuclear dipole-dipole interactions 
between different molecules is less than 10% of the 

25 inter-nuclear dipole-dipole interaction within one 

molecule. 



14. The method of claim 10 wherein the substrate has pores of 
sufficient size to permit the target to diffuse and bind 

30 to the compound in the sample. 

15. The method of claim 9 wherein rotational echo double 
resonance nuclear magnetic resonance measurements can be 
made on the compound bound to the target or hydrated or 

35 in a dry nitrogen atmosphere. 
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16* The method of claim 10 wherein t:he compound is a peptide, 
and a plurality of the peptide is attached to the 
substrate surface, which has a purity of the peptide of 
at least 95% and wherein the surface density of the 
S peptide is no more than one peptide per 100 of 

substrate surface. 

17. The method of claim 10 wherein the substrate is selected 
from the group consisting of p-MethylBenzhydrilamine 

10 resin, divinylbenzyl polystyrene resin, and glass beads. 

18. The method of claim 8 wherein the selected nuclei are 
selected from the group consisting of "c, *'f, and ^^P. 



15 19. The method of claim 9 wherein the nuclear magnetic 

resonance spectrometer comprises magnetic excitation 
means, a sample rotor, and free induction decay observing 
means, and the step of making rotational echo double 
resonance nuclear magnetic resonance measurements further 

20 comprises the steps of: 

(a) spinning the sample in the sample rotor, 

(b) initially exciting magnetically the selected nuclei 
to be observed, 

(c) providing subsequently one n spin flip magnetic 

25 excitation during each rotor period to each of the 

selected nuclei, the pulses to the different nuclei 
having fixed phase delays, 

(d) observing the free induction decay signal as a 
function of the number of rotor periods; and 

30 (e) finding the dipole-dipole strength between the 

selected nuclei, whereby the internuclear distance 
between the selected nuclei can be obtained. 



20. The method of claim 1 wherein the step of measuring 
35 distances comprises making liquid phase nuclear magnetic 

resonance measurements • 
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21. A method of determining a consensus pharmacophore 
structure comprising the st ps of: 

(a) identifying from one or more diversity libraries a 
plurality of compounds that bind to a target 

5 molecule, 

(b) determining a consensus pharmacophore structure for 
the compounds. 

22* A method of determining a consensus pharmacophore 
10 structure comprising the steps of: 

(a) measuring one or more distances in one or more 
compounds that bind to a target molecule, and 

(b) determining a consensus pharmacophore structure for 
the compounds. 

IS 

23 • The method of claim 21 or 22 further comprising a step of 
selecting a plurality of candidate pharmacophores based 
on rules of chemical homology, the selected plurality of 
candidate pharmacophores being used in step (b) to 
20 determine the consensus pharmacophore structure. 

24. The method of claim 23 wherein the compounds have limited 
conformational degrees of freedom at the temperature of 
interest, and wherein the step of determining a consensus 
25 pharmacophore structure for each compound further 

comprises, performing a consensus conf igurational bias 
Monte Carlo method, said Monte Carlo method comprising 
the steps of: 

(a) generating a proposed structure for a compound 
30 identified from said one or more diversity libraries 

by making conformational alterations consistent with 
the conformational degrees of freedom, the 
alterations being made to a representation of the 
compound's current chemical and conformational 
35 structure to generate a proposed representation, the 

proposed structure being g n rated with a bias 
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10 



toward more acceptable configurations of lower 

en rgy, vh reby the method is aade more efficient, 

(b) acc pting and st ring the proposed structure 
according to a probability depending on an energy 
determined for the proposed structure, and 

(c) repeating these steps until sufficient structures 
have been stored for each compound to permit 
statistically significant determination of an 
equilibrium structure for each compound* 



25. A method of determining one or more lead compounds for 
use as a drug that binds to a target molecule comprising 
the steps of: 

(a) identifying from one or more diversity libraries a 
15 plurality of compounds that bind to a target 

molecule; 

(b) determining a consensus pharmacophore structure for 
the compounds; and 

(c) determining one or more lead compounds for use as a 
20 drug which share a pharmacophore specification with 

the determined consensus pharmacophore structure. 

26. A method of determining one or more lead compounds for 
use as a drug that binds to a target molecule comprising 

25 the steps of: 

(a) measuring one or more distances in one or more 
compounds that bind to a target molecule; 

(b) determining a consensus pharmacophore structure for 
the compounds; and 

30 (c) determining one or more lead compounds for use as a 

drug which share a pharmacophore specification with 
the determined consensus pharmacophore structure. 

27. The method according to claim 25 or 26 wherein said step 
35 of determining one or more lead compounds comprises 

modifying a compound id ntif ied as binding to th targ t 
mol cul , said modification being done outside of the 
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pharmacophore structure, to rend r the compound more 
attractive for use as a drug. 

28, The method of claim 5 wherein the compounds have limited 
5 conformational degrees of freedom at a temperature of 

interest r and wherein the step of determining a consensus 
pharmacophore structure for the compounds further 
comprises performing a consensus conf igurational bias 
Monte Carlo method, said Monte carlo method comprising 
XO the steps of: 

(a) generating a proposed structure for a compound 
identified from said one or more diversity libraries 
by making conformational alterations consistent with 
the confoarmational degrees of freedom, the 

15 alterations being made to a representation of the 

compound's current chemical and conformational 
structure to generate a proposed representation, the 
proposed structure being generated with a bias 
toward more acceptable configurations of lower 

20 energy, 

(b) accepting and storing the proposed structure 
according to a probability depending on an energy 
det ex-mined for the proposed structure, and 

(c) repeating these steps until sufficient structures 
25 have been stored for each compound to permit 

statistically significant determination of an 
equilibrium structure for each compound* 

29. The method of claim 28 wherein the limited conformational 
30 degrees of freedom comprise torsional rotations about 

mutual bonds between otherwise rigid subunits of the 
compound, each rigid unites representation comprising its 
interconnections and atomic composition, each atom's 
representation comprising its type and position, the 
35 torsional rotations respecting any conformational 

constraints present. 
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30. The method of claim 28 wherein the compound is a peptide, 
p ptide derivative, or peptide analog. 

31. The method of claim 28 wherein the conformational 

5 alterations comprise constrained, concerted torsional 

rotations or removal of a side chain and regrowth of the 
side chain with a new torsional conformation. 

32. The method of claim 31 wherein the constrained, concerted 
XO torsional rotations are constrained so that no more than 

four rigid units are spatially displaced. 

33. The method of claim 28 wherein determining the energy for 
the proposed structure of one compound comprises 

15 including one or more constraint terms which represent 

knowledge of measured structure for the compound. 

34. The method of claim 33 wherein the constraint terms 
comprise a weighted sum of squares of differences of the 

20 actual and measured structures. 

35. The method of claim 28 wherein the energy is determined 
for the proposed structure of one compound by a method 
comprising including consensus terms which represent 

25 knowledge that the identified compounds all bind to the 

same target, the compounds being otherwise treated 
independently by the method. 

36. The method of claim 35 wherein the consensus terms are a 
30 weighted sum of squares of differences in the atomic 

positions of a candidate pharmacophore from the average 
values of these positions in all the compounds. 

37. The method of claim 35 wherein the step of determining 
35 the consensus pharmacophore structure comprises 

determining from the plurality of selected candidate 
pharmac phores a candidate pharmacophore for which the 
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consensus terms are relatively small compared to the 
total energy. 

38. The method of claim 35 wherein the step of determining 
5 the consensus pharmacophore structure comprises 

determining from the plurality of selected candidate 
pharmacophores a candidate pharmacophore for which the 
consensus terms are minimum compared to other selected 
regions. 

10 

39 ► The method of claim 28 wherein the equilibrium structure 
is determined by a method comprising averaging selected 
generated and accepted structures for each compound* 

15 40. The method of claim 39 wherein the averaging of 

structures comprises clustering selected generated and 
accepted structures into sets of similar structures and 
averaging these sets for each member. 

20 41, A method of identifying a compound that binds to a target 
molecule comprising the following steps in the order 
stated: 

(a) contacting compounds of a phage display or polysome- 
based diversity library with a target molecule; 

25 (b) identifying one or more compounds in the library 

that bind to the target molecule; 
(c) contacting one or more first fusion proteins, each 
first fusion protein comprising an identified 
compound, with a second fusion protein comprising 

30 the target molecule or a binding portion thereof, in 

which binding of the first fusion protein to the 
second fusion protein results in an increase in 
activity or activation of a transcriptional promoter 
or an origin of replication; and 

35 (d) identifying one or more of the compounds that when 

present in said first fusion protein result in said 
increase in activity r activation. 
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42* A ID thod of aaXing solid state nuclear magnetic resonanc 
neasureinents comprising measuring int rnuclear dipole* 
dipol interaction strengths between selected nuclei in a 
compound, said compound being attached to the surface of 
5 a substrate. 



43. The method of claim 42 which further comprises before 
said measuring step the step of synthesizing a plurality 
of said compound on the surface of the substrate. 

10 

44. The method of claim 43 wherein said plurality of the 
compound is at least 95% pure. 

45. The method of claim 4 2 wherein a plurality of said 

15 compound is attached to the substrate surface, with at 

least 10 A spacing between molecules of the compound. 

46. The method of claim 42 wherein the substrate has pores of 
sufficient size to permit a molecule to diffuse and bind 

20 to the compound. 

47. The method of claim 42 wherein the substrate has a 
surface density of the compound such that the inter- 
nuclear dipole-dipole interactions between different 

25 molecules of the compound is less than 10% of the inter- 

nuclear dipole-dipole interaction within one molecule of 
the compound. 

48. The method of claim 42 wherein the compound is a peptide, 
30 peptide derivative, or peptide analog. 

49. The method of claim 42 wherein the substrate is selected 
from the group consisting of p-MethylBenzhydrilamine 
resin, divinylbenzyl polystyrene resin, and a glass bead. 

35 

50. The method of claim 42 wherein said measuring step 
compris s using a nuclear magnetic resonance 
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spectrometer, said spectrometer comprising magnetic 
excitation means, a sample rot r, and free induction 
decay observing means; and said measurement of | 
internuclear dipole-dipole interaction is done by a 
5 method comprising the steps of: 

(a) spinning the sample in the sample rotor; 

(b) initially exciting magnetically the selected nuclei 
to be observed; 

(c) providing subsequently one or more n spin flip 

10 magnetic excitations during each rotor period to one 

or both of the selected nuclei, wherein pulses to 
the different nuclei have fixed phase delays; 

(d) observing a free induction decay signal as a 
function of the number of rotor periods; and 

15 (e) determining the dipole-dipole strength between the 

selected nuclei, whereby the internuclear distance 
between the selected nuclei can be obtained. 

51, A method of conf igurational bias Monte Carlo 
20 determination of the structure of a compound having 

limited conformational degrees of freedom at a 
temperature of interest, the method comprising the steps 
of : 

(a) generating a proposed structure for the compound by 
25 making conformational alterations consistent with 

the conformational degrees of freedom, the 
alterations being made to a representation of the 
compound's current chemical and conformational 
structure to generate a proposed representation; 
30 (b) accepting and storing the proposed structure 

according to a probability depending on an energy 
determined for the proposed structure; and 
(c) repeating these steps until sufficient structures 

have been stored to permit statistically significant 
35 determination of an equilibrium structure. 
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52. The ne^od of claim 51 wherein the conf omational degre s 
of freedoiD compris torsional rotations about mutual 
bonds between otherwise rigid subunits of the compound, 
each rigid unit's representation comprising its 
S interconnections and atomic composition, each atom's 

representation comprising its type and position, the 
torsional rotations respecting any conformational 
constraints present « 

10 53. The method of claim 51 wherein the compound is a peptide, 
peptide derivative, or peptide analog. 

54. The method of claim 51 wherein the conformational 
alterations comprise constrained, concerted torsional 

15 rotations. 

55. The method of claim 54 wherein the constrained, concerted 
torsional rotations are constrained so that no more than 
four rigid units are spatially displaced. 

20 

56. The method of claim 51 wherein the conformational 
alterations comprise removal of a side chain and regrowth 
of the side chain with a new torsional conformation. 

25 57. The method of claim 51 wherein the proposed structures 
are generated with a bias toward more acceptable 
configurations of lower energy. 

58. The method of claim 51 wherein the energy is determined 
30 for the proposed structure by a method comprising 

including constraint terms which represent knowledge of 
measured structure for the compound. 

59. The method of claim 58 wherein the constraint terms 

35 comprise a weighted sum of squares of differences of the 

actual and measured structures. 
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60* The method of claim 51 applied to a plurality of 

compounds f limited c nf rmational degrees of freedom 
all of which bind to th sam target molecule wherein the 
method further comprises a step of selecting a plurality 
5 of candidate pharmacophores based on rules of chemical 

homology. 

61. The method of claim 60 wherein the energy is determined 
for the proposed structure of one of the plurality of 

10 compounds by a method comprising including consensus 

terms which represent knowledge that the compounds all 
bind to the same target molecule. 

62. The method of claim 61 wherein the consensus terms are a 
15 weighted sum of squares of differences in the atomic 

positions of a candidate pharmacophore of said one of the 
plurality of compounds from the average values of these 
positions in all the compounds. 

20 63. The method of claim 61 which further comprises a step of 
determining a consensus pharmacophore structure by 
determining from the plurality of selected candidate 
pharmacophores that candidate pharmacophore for which the 
consensus terms are minimum compared to other candidate 

25 pharmacophores. 

64. The method of claim 60 which further comprises a step of 
determining a consensus pharmacophore structure by 
determining from the plurality of selected candidate 

30 pharmacophores that candidate pharmacophore for which the 

consensus terms are relatively small compared to the 
total energy. 

65. The method of claim 63 or 64 which further comprises a 

35 step of determining one or more lead compounds for use as 

a drug which shar a pharmacophore specification with th 
d termined consensus pharmacophor structure. 
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66. The method of claim 51 wherein the equilibrium structure 
i6 det mined by a method comprising averaging selected 
generated and accepted structures. 

S 67. The method of claim 66 wherein the averaging of 

structures comprises clustering selected generated and 
accepted structures into sets of similar structures and 
averaging these sets. 

10 68. An apparatus for conf igurational bias Honte Carlo 

determination of the structure of a compound having 
limited conformational degrees of freedom at a 
temperature of interest, the apparatus comprising: 

(a) memory means for storing 

15 (i) data structures representing the compound's 

chemical and conformational structure 
consistently with the compound's degrees of 
freedom , 

(ii) similar data structures representing the 
20 compound's proposed structure and prior 

structures , and 
(iii) parameters representing atomic interactions, 
and 

(b) processor means for executing programs for 

25 (i) generating a proposed structure by making 

conformational alterations consistent with the 
conformational degrees of freedom and with a 
bias toward more acceptable configurations of 
lower energy, 

30 (ii) accepting and storing the proposed structure 

according to a probability depending on an 
energy determined for the proposed structure, 
and 

(iii) repeating these steps until sufficient 
35 structures have been stored to permit 

statistically significant determination of an 
equilibritim structure. 
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69. The apparatus of claim 68 wherein the conformational 
degrees of freedom comprise torsional rotations about 
mutual bonds between otherwise rigid subunits of the 
compound, each rigid unit's representation comprising its 
5 interconnections and atomic composition, each atom's 

representation comprising its type and position, the 
torsional rotations respecting any conformational 
constraints present. 



10 70. The apparatus of claim 68 wherein the compound is a 
peptide, peptide derivative, or peptide analog. 

71. The apparatus of claim 68 wherein the memory, processor, 
and control means are configured from a workstation typ 

15 digital computer comprising RAM memory, disk memory, 

processor, and input and display devices. 

72. The apparatus of claim 68 wherein the conformational 
alterations made by the processor means further comprise 

20 constrained, concerted torsional rotations or removal of 

a side chain and regrowth of the side chain with a new 
torsional conformation. 

73. The apparatus of claim 72 wherein the constrained, 

25 concerted torsional rotations are constrained so that no 

more than four rigid units are spatially displaced. 

74. The apparatus of claim 68 wherein the processor means 
determines an energy for the proposed structure by a 

30 method comprising including constraint terms which 

represent knowledge of measured structure for the 
compound. 

75. The apparatus of claim 74 wherein the constraint terms 
35 comprise a weighted sim of scjuares of differences of the 

actual and measured structures* 
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76. The apparatus of claim 68 applied to a plurality of 

comp unds of limited conformational degrees of fre don 
all of which bind to the sam target molecule, and 
wherein the processor means further comprises programs 
5 for selecting a plurality of candidate pharmacophores 

based on rules of chemical homology. 

77* The apparatus of claim 76 wherein the processor means 
determines an energy for the proposed structure of any 
10 one compound by a method comprising including consensus 

terms which represent knowledge that the compounds all 
bind to the same target molecule, 

78. The apparatus of claim 77 wherein the consensus terms are 
15 a weighted sum of squares of differences in the atomic 

positions of the candidate pharmacophore of said one 
compound from the average values of these positions in 
all the compounds. 

20 79. The apparatus of claim 77 wherein the processor means 
further comprises programs for determining a consensus 
pharmacophore structure by determining from the plurality 
of selected candidate pharmacophores a candidate 
pharmacophore for which the consensus terms are minimum 

25 compared to other candidate pharmacophores. 

80. The apparatus of claim 77 wherein the processor means 
further comprises programs for determining a consensus 
pharmacophore structure by determining from the plurality 

30 of selected candidate pharmacophores a candidate 

pharmacophore for which the consensus terms are 
relatively small compared to the total energy. 

81. The apparatus of claim 79 or 80 wherein the processor 
35 means further comprises programs for determining ne or 

more lead compounds for use as a drug that share a 

• 331 • 



wo 96/30849 



PCTAJS96/04229 



pharmacophore specification with the cons nsus 
phanaac ph re structure. 

62, The apparatus of claim 68 wherein the processor means 
S determines an equilibrium structure by a method 

comprising averaging selected generated and accepted 
structures . 

83. The apparatus of claim 62 wherein the averaging of 
10 structures further comprises clustering selected 

generated and accepted structures into sets of similar 
structures and averaging these sets« 

84. In a digital computer, apparatus for conf igurational bias 
15 Monte Carlo determination of the structure of at least 

one compound having limited conformational degrees of 
freedom at a temperature of interest, said apparatus 
comprising: 

(a) first memory means for storing data structures 
20 representing the compound's chemical and 

conformational structure consistently with the 
compound's degrees of freedom, 

(b) second memory means for storing similar data 
structures representing the compound's proposed 

25 structure, 

(c) third memory means for storing similar data 
structures representing the compound's prior 
structures , 

(d) first processor means for generating a proposed 
30 structure by making conformational alterations 

consistent with the conformational degrees of 
freedom and with a bias toward conformations of 
lower energy, 

(e) second processor means for accepting and storing the 
35 proposed structure according to a probability 

depending on an energy determined for the proposed 
structure , and 
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{f ) third processor means for controlling and repeating 
th generation and acceptance until suffici nt 
structures have be n stored to permit statistically 
significant determination of an equilibrium 
5 structure. 



85. The digital computer apparatus of claim 84 wherein the 
conformational degrees of freedom comprise torsional 
rotations about mutual bonds between otherwise rigid 

10 subunits of the compound, each rigid unit's 

representation comprising its interconnections and atomic 
composition, each atom's representation comprising its 
type and position, the torsional rotations respecting any 
conformational constraints present. 

15 

86. The digital computer apparatus of claim 84 wherein the 
compound is a peptide, peptide derivative, or peptide 
analog. 

20 87- The digital computer apparatus of claim 84 wherein the 
digital computer is a workstation type digital computer 
comprising RAM memory, disk memory, processor, and input 
and display devices. 

25 88. The digital computer apparatus of claim 84 wherein the 
conformational alterations generated by the first 
processor means comprise constrained, concerted torsional 
rotations or removal of a side chain and regrowth of the 
side chain with a new torsional conformation. 

30 

89. The digital computer apparatus of claim 88 wherein the 
constrained, concerted torsional rotations are 
constrained so that no more than four rigid units are 
spatially displaced. 

35 

90. The digital computer apparatus of claim 84 wherein the 
second processor means determines an energy for the 
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proposed st:ructure by a method comprising including 
constraint terms which represent knowledge of measured 
structure for the compound. 



5 91- The digital computer apparatus of claim 90 wherein the 
constraint terms comprise a weighted sum of squares of 
differences of the actual and measured structures. 



92. The digital computer apparatus of claim 84 in which said 
10 at least one compound is a plurality of compounds of 

limited conformational degrees of freedom all of which 
bind to the same target and wherein data are stored in 
said first memory means representing the chemical and 
conformational structure of said plurality of compounds 
15 and wherein the apparatus further comprises additional 

processor means for selecting a plurality of candidate 
pharmacophores based on rules of chemical homology, 

93. The digital computer apparatus of claim 92 wherein the 
20 second processor means determines an energy for the 

proposed structure of one of said plurality of compounds 
by a method comprising including consensus terms which 
represent knowledge that the compounds all bind to the 
same target molecule. 

25 

94. The digital computer apparatus of claim 92 wherein the 
consensus terms are a weighted sum of squares of 
differences in the atomic positions of a candidate 
pharmacophore of said one of the plurality of compounds 

30 from the average values of these positions in all the 

compounds * 



95. The digital computer apparatus of claim 93 wherein the 
apparatus further comprises processor means for 
35 determining a consensus pharmacophore structure by 

determining from the plurality of selected candidate 
pharmacoph r s a candidat pharmacophore for which the 
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consensus terns are relatively small compared to th 
total energy* 

96. The digital computer apparatus of claim 93 wherein the 
5 apparatus further comprises processor means for 

determining a consensus pharmacophore structure by 
determining from the plurality of selected candidate 
pharmacophores a candidate pharmacophore for which the 
consensus terms are minimum compared to other candidate 
10 pharmacophores* 

97. The digital computer apparatus of claims 95 or 96 wherein 
the apparatus further comprises processor means for 
determining one or more lead compounds for use as a drug 

15 that share a pharmacophore specification with the 

consensus pharmacophore structure ♦ 

98. The digital computer apparatus of claim 84 wherein the 
third processor means determines an eguilibrium structure 

20 by a method comprising averaging selected generated and 

accepted structures. 

99* The digital computer apparatus of claim 98 wherein the 
averaging of structures comprises clustering selected 
25 generated and accepted structures into sets of similar 

structures and averaging these sets. 

100. In a digital computer, apparatus for conf igurational bias 
Monte Carlo determination of the structure of a plurality 
30 of compounds having limited conformational degrees of 

freedom, each compound having a backbone and side chains, 
said apparatus comprising: 

(a) first memory means for storing data structures 
representing each compound *s chemical and 
35 conformational structure consistently with that 

compound's degrees of freedom and constraints^ 
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(b) second men ry m ans f r storing similar data 
structures representing a proposed structure for one 
or more of the compounds, 

(c) third memory means for storing similar data 

5 structures representing prior structures of the 

plurality of compounds^ 

(d) first processor means for generating a proposed 
structure of a randomly selected compound by making 
conformational alterations consistent with the 

10 conformational degrees of freedom, the 

conformational alterations being randomly 
distributed between alterations that alter the 
structure of a randomly selected side chain of the 
selected compound and alterations that alter the 

15 structure of a randomly selected region of the 

backbone of the selected compound, the proposed 
structure being stored in the second memory means, 
the proposed structure being generated with a bias 
toward more acceptable structures of lower energy, 

20 whereby the method is made more efficient, 

(e) second processor means for accepting a proposed 
structure according to a probability depending on an 
energy determined for the proposed structure, the 
energy including terms representing physical 

25 interactions and terms representing heuristic 

information about the compound's structure, the 
heuristic information comprising knowledge about 
measured distances in one or more compounds of said 
plurality and about the plurality of the compounds 

30 binding to a same target molecule, 

(f) third processor means for controlling and repeating 
these steps until sufficient structures have been 
generated and accepted to permit statistically 
significant determination of an equilibriiim 

35 structure. 
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101 • The digital computer of claim 100 wherein the 

conformational degrees of freedom comprise torsional 
rotations about mutual bonds between otherwise rigid 
subunits of the compound, each rigid unit's 
5 representation comprising its interconnections and atomic 

composition, each atom's representation comprising its 
type and position, the torsional rotations respecting any 
conformational constraints present. 

10 102. The digital computer of claim 100 wherein the compound is 
a peptide, peptide derivative, or peptide analog. 

103. A method of conf igurational bias Monte Carlo 

determination of the structure of a compound selected 
15 from the group consisting of a peptide, peptide 

derivative, and peptide analog, the method comprising the 
steps of: 

(a) representing the conformation of the compound by 
interconnected rigid units capable of torsional 

20 rotation about common bonds, each rigid unit's 

representation comprising its interconnections and 
atomic composition, each atom's representation 
comprising its type and position, 

(b) generating a proposed structure by making 

25 conformational alterations consistent with the 

compoxind's structure, 

(c) accepting a proposed structure according to a 
probability depending on an energy determined for 
the proposed structure, and 

(d) repeating these steps until sufficient structures 
have been generated and accepted to permit 
statistically significant determination of an 
equilibrium structure. 

35 104. An apparatus for conf igurational bias Monte Carlo 

determination of the structure f a compound selected 
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£r n the group consisting of a peptide, peptide 
derivative, and peptide analog, the apparatus c mprising: 

(a) aenory neans for storing 

(i) data structures representing the compound's 
5 conformation as interconnected rigid units 

capable of torsional rotation about common 
bonds, each rigid unit's representation 
comprising its interconnections and atomic 
composition, each atom's representation 
10 comprising its type and position, 

(ii) similar data structures representing the 
compound's proposed structure and prior 
structures , and 

(iii) parameters representing atomic interactions, 
15 and 

(b) processor means for executing programs for 

(i) generating a proposed structure by making 
conformational alterations consistent with the 
compound's structure and with a bias toward 

20 more acceptable configurations of lower energy, 

(ii) accepting a proposed structure according to a 
probability depending on an energy determined 
for the proposed structure, and 

(iii) repeating these steps until sufficient 
25 structures have been generated and accepted to 

permit statistically significant determination 
of an equilibrium structure* 



105. In a digital computer, apparatus for conf igurational bias 
30 Monte Carlo determination of the structure of a compound 

selected from the group consisting of a peptide, peptide 
derivative, and peptide analog, said apparatus 
comprising: 

(a) first memory means for storing data structures 
35 representing the compound's structure as 

interconnected rigid units capable of t rsional 
rotation about common b nds, each rigid unit's 
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representation comprising its interconnections and 
atomic composition, each atom's representation 
comprising its type and position, 

(b) second memory means for storing similar data 

5 structures representing the compound's proposed 

structure , 

(c) third memory means for storing similar data 
structures representing the compound's prior 
structures , 

10 (d) first processor means for generating a proposed 

structure by making conformational alterations 
consistent with the compound's structure and 
constraints and with a bias toward conformations of 
lower energy, 

15 (e) second processor means for accepting a proposed 

structure according to a probability depending on an 
energy determined for the proposed structure, and 
(f) third processor means for controlling and repeating 
these steps until sufficient structures have been 

20 generated and accepted to permit statistically 

significant determination of an equilibrium 
structure, 

106. In a digital computer, apparatus for conf igurational bias 
25 Monte Carlo determination of the structure of a plurality 

of compounds selected from the group consisting of 
peptides, peptide derivatives, and peptide analogs, each 
compound having a backbone and side chains, said 
apparatus comprising: 
20 (a) first memory means for storing data structures 

representing each compound's structure as 
interconnected rigid units capable of torsional 
rotation about common bonds, each rigid unit's 
representation comprising its interconnections and 
35 atomic composition, each atom's representation 

comprising its type and position. 
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(b) s cond aemory m ans for storing similar data 
structures representing a proposed structure for one 
or more of th compounds, 

(c) third memory means for storing similar data 

5 structures representing prior structures of the 

plurality of the compounds, 

(d) first processor means for generating a proposed 
structure of a randomly selected compound by making 
conformational alterations consistent with the 

10 compound's structure, the conformational alterations 

being randomly distributed between alterations that 
alter the structure of a randomly selected side 
chain of the selected compound and alterations that 
alter the structure of a randomly selected region of 

15 the backbone of the selected compound, the proposed 

structure being stored in the second memory means, 
the proposed structure being generated with a bias 
toward more acceptable structures of lower energy, 

(e) second processor means for accepting a proposed 

20 structure according to a probability depending on an 

energy determined for the proposed structure, the 
energy including terms representing physical 
interactions and terms representing heuristic 
information about the compound's structure, the 

25 heuristic information comprising knowledge about 

measured distances in one or more compounds of said 
plurality and about the plurality of the compounds 
binding to a same target molecule, 

(f) third processor means for controlling and repeating 
30 these steps until sufficient structures have been 

generated and accepted to permit statistically 
significant determination of an equilibrium 
structure • 

25 107. The method of claim 42 wherein the nuclear magnetic 
res nance is rotational echo double r sonance. 
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